- Add git_local.py: fetch, checkout PR branch, generate git diff against target, restore branch after review - Update fetch_pr_details to use local git diff when REPOS_BASE_DIR is set, with fallback to AzDo iteration API - Update run_code_review to restore repo to target branch after review - Refine Claude review prompt to only comment on diff changes, not pre-existing code - Update README: WSL venv gotcha, local git checkout flow, flow diagram
476 lines
20 KiB
Markdown
476 lines
20 KiB
Markdown
# Billo Release Agent
|
|
|
|
A LangGraph-based release automation agent for Billo. Automates the full release
|
|
pipeline: PR discovery, code review (via Claude Code CLI), Jira ticket management,
|
|
staging release tracking, CI/CD pipeline triggering with approval gates, and Slack
|
|
interactive notifications.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
+--- Azure DevOps Webhook ---+ +--- PR Poller (every 5 min) ---+
|
|
| POST /webhooks/azdo | | Scans WATCHED_REPOS for |
|
|
| (push-based) | | active PRs (pull-based) |
|
|
+------------+---------------+ +------------+------------------+
|
|
| |
|
|
v v
|
|
+---------------------------------------------+
|
|
| FastAPI Application |
|
|
| /webhooks/azdo /slack/interactions |
|
|
| /approvals/* /manual/* /status |
|
|
+---------------------+------------------------+
|
|
|
|
|
v
|
|
+---------------------------------------------+
|
|
| LangGraph Graphs |
|
|
| |
|
|
| pr_completed: |
|
|
| parse -> fetch -> [has ticket?] |
|
|
| no -> Claude generates ticket |
|
|
| yes -> Claude code review |
|
|
| -> merge -> Jira -> staging -> CI build |
|
|
| |
|
|
| release: |
|
|
| create release PR -> merge -> CI build |
|
|
| -> CD release -> [Sandbox approve] |
|
|
| -> [Production approve] -> Slack notify |
|
|
+---------------------+------------------------+
|
|
|
|
|
+----------------+----------------+
|
|
| | |
|
|
v v v
|
|
PostgreSQL Azure DevOps Slack (buttons)
|
|
- threads - PRs/Pipelines - Notifications
|
|
- staging - Builds/Releases - Approvals
|
|
- releases
|
|
```
|
|
|
|
## Key Features
|
|
|
|
| Feature | Description |
|
|
|---------|-------------|
|
|
| **PR Discovery** | Webhook-based (push) or polling-based (pull) — or both |
|
|
| **Auto-Create Jira Ticket** | When PR branch has no ticket ID, Claude generates summary + description and creates a Jira Story |
|
|
| **AI Code Review** | Claude Code CLI reviews PRs on the checked-out PR branch with full repo context (Read/Glob/Grep), using real `git diff` |
|
|
| **CI/CD Integration** | Triggers CI builds after merge, polls for completion, handles CD release approval gates |
|
|
| **Slack Interactive** | Approval requests with [Approve]/[Cancel] buttons, CI/CD status notifications |
|
|
| **Human-in-the-loop** | 5 interrupt points where operator confirmation is required before destructive actions |
|
|
| **Per-repo Versioning** | Independent semantic versioning per repository (patch auto-increment) |
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.12+
|
|
- PostgreSQL 16+
|
|
- [uv](https://github.com/astral-sh/uv) (recommended) or pip
|
|
- Claude Code CLI installed and authenticated (`claude` in PATH)
|
|
- Slack App (for interactive buttons) or Slack Incoming Webhook (for notifications only)
|
|
|
|
## Quick Start
|
|
|
|
### Local Development
|
|
|
|
```bash
|
|
# Install dependencies
|
|
uv sync --all-extras
|
|
|
|
# Copy and configure environment
|
|
cp .env.example .env
|
|
# Edit .env -- fill in all REQUIRED variables
|
|
|
|
# Start PostgreSQL
|
|
docker compose up -d db
|
|
|
|
# Run the server
|
|
uv run uvicorn release_agent.main:app --reload --port 8000
|
|
|
|
# Verify
|
|
curl http://localhost:8000/status
|
|
```
|
|
|
|
### Docker Compose (Production)
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env -- POSTGRES_PASSWORD, WEBHOOK_SECRET, etc. are required
|
|
|
|
docker compose up -d
|
|
```
|
|
|
|
Tables are created automatically on first startup.
|
|
|
|
## Configuration
|
|
|
|
All configuration is via environment variables. See `.env.example` for the full list.
|
|
|
|
### Required Variables
|
|
|
|
| Variable | Description |
|
|
|----------|-------------|
|
|
| `AZDO_ORGANIZATION` | Azure DevOps organization name |
|
|
| `AZDO_PROJECT` | Azure DevOps project name |
|
|
| `AZDO_PAT` | Azure DevOps personal access token |
|
|
| `POSTGRES_DSN` | PostgreSQL connection string |
|
|
| `POSTGRES_PASSWORD` | PostgreSQL password (used by docker-compose) |
|
|
| `JIRA_EMAIL` | Jira account email |
|
|
| `JIRA_API_TOKEN` | Jira API token |
|
|
| `SLACK_WEBHOOK_URL` | Slack incoming webhook URL |
|
|
| `WEBHOOK_SECRET` | Shared secret for validating AzDo webhooks (must be non-empty) |
|
|
|
|
### Optional Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `REPOS_BASE_DIR` | `""` | Base dir with Billo repos (e.g., `/home/kai/git/billo`). Each repo must be cloned with its AzDo name as the directory name. |
|
|
| `WATCHED_REPOS` | `""` | Comma-separated repos to poll (e.g., `Billo.Platform.Payment,Billo.Platform.Document.DocumentAnalyser`) |
|
|
| `PR_POLL_ENABLED` | `False` | Enable periodic PR polling |
|
|
| `PR_POLL_INTERVAL_SECONDS` | `300` | Polling interval (5 min) |
|
|
| `PR_POLL_TARGET_BRANCH` | `refs/heads/develop` | Target branch filter |
|
|
| `DEFAULT_JIRA_PROJECT` | `ALLPOST` | Jira project key for auto-created tickets |
|
|
| `AUTO_CREATE_TICKET_ENABLED` | `True` | Auto-create Jira ticket when branch has no ticket ID |
|
|
| `SLACK_BOT_TOKEN` | `""` | Slack App bot token (for interactive buttons) |
|
|
| `SLACK_SIGNING_SECRET` | `""` | Slack App signing secret (required for /slack/interactions) |
|
|
| `SLACK_CHANNEL_ID` | `""` | Channel for interactive messages |
|
|
| `CI_POLL_INTERVAL_SECONDS` | `30` | CI build status poll interval |
|
|
| `CI_POLL_MAX_WAIT_SECONDS` | `1800` | Max wait for CI completion (30 min) |
|
|
| `OPERATOR_TOKEN` | `""` | Token for operator endpoints (empty = no auth) |
|
|
| `JIRA_BASE_URL` | `https://billolife.atlassian.net` | Jira instance URL |
|
|
| `PORT` | `8000` | HTTP server port |
|
|
|
|
### Security Notes
|
|
|
|
- `WEBHOOK_SECRET` must be non-empty; empty secret rejects all webhooks
|
|
- `POSTGRES_PASSWORD` has no default in docker-compose; fails if unset
|
|
- `SLACK_SIGNING_SECRET` must be set for `/slack/interactions` to accept requests (returns 503 if empty)
|
|
- Slack signature verification includes 5-minute replay attack prevention
|
|
- All secrets use `SecretStr` and are never logged or included in error responses
|
|
- Set `OPERATOR_TOKEN` in production to protect approval and manual trigger endpoints
|
|
|
|
## API Endpoints
|
|
|
|
### Webhooks
|
|
|
|
| Method | Path | Auth | Description |
|
|
|--------|------|------|-------------|
|
|
| POST | `/webhooks/azdo` | `X-Webhook-Secret` | Receive Azure DevOps PR webhook events |
|
|
| POST | `/slack/interactions` | Slack Signing Secret | Receive Slack button click callbacks |
|
|
|
|
### Approvals (requires `X-Operator-Token` when configured)
|
|
|
|
| Method | Path | Description |
|
|
|--------|------|-------------|
|
|
| GET | `/approvals/pending` | List threads awaiting operator approval |
|
|
| POST | `/approvals/{thread_id}` | Submit approval decision (merge/cancel/approve/skip) |
|
|
|
|
### Status and Manual Triggers
|
|
|
|
| Method | Path | Auth | Description |
|
|
|--------|------|------|-------------|
|
|
| GET | `/status` | None | Health check |
|
|
| GET | `/releases/{repo}` | None | List versions for a repo |
|
|
| GET | `/staging?repo={repo}` | None | Current staging release |
|
|
| POST | `/manual/pr/{pr_id}` | `X-Operator-Token` | Manually trigger PR processing |
|
|
| POST | `/manual/release` | `X-Operator-Token` | Manually trigger a release |
|
|
|
|
## Graph Workflows
|
|
|
|
### PR Completed
|
|
|
|
```
|
|
parse_webhook -> fetch_pr_details -> route_after_fetch
|
|
| |
|
|
| +-- (local repo available?)
|
|
| yes -> git fetch + checkout PR branch + git diff
|
|
| no -> AzDo iteration API (file list only)
|
|
|
|
|
|-- merged -----------------> calculate_version -> update_staging -> CI build -> END
|
|
|-- active_with_ticket -----> move_jira_code_review -+
|
|
|-- active_no_ticket -------> auto_create_ticket ----+
|
|
|
|
|
run_code_review -> evaluate_review
|
|
|-- approve -> [Slack: Merge?] -> merge_pr
|
|
|-- request_changes -> notify -> END
|
|
-> restore branch to develop
|
|
-> Jira transitions -> calculate_version
|
|
-> update_staging -> CI build -> notify -> END
|
|
```
|
|
|
|
### Release
|
|
|
|
```
|
|
load_staging -> [Slack: Create release?] -> create_release_pr
|
|
-> [Slack: Merge release?] -> merge_release_pr
|
|
-> CI build on main -> poll until complete
|
|
|-- ci_failed -> notify failure -> END
|
|
|-- ci_passed -> wait for CD release -> approval loop:
|
|
|-- [Slack: Approve Sandbox?] -> approve -> poll again
|
|
|-- [Slack: Approve Production?] -> approve -> poll again
|
|
|-- all_deployed -> move tickets to Done
|
|
-> Slack release notification -> archive -> END
|
|
```
|
|
|
|
### Interrupt Points (Slack buttons)
|
|
|
|
| # | When | Slack Message | Buttons |
|
|
|---|------|--------------|---------|
|
|
| 1 | After code review approves | PR title + review summary | [Merge] [Cancel] |
|
|
| 2 | Before creating release PR | Version + ticket list | [Create] [Cancel] |
|
|
| 3 | Before merging release PR | Release PR link | [Merge] [Cancel] |
|
|
| 4 | Before triggering pipelines | Pipeline list | [Trigger] [Skip] |
|
|
| 5 | Before approving release stage | Stage name + status | [Approve] [Skip] |
|
|
|
|
## PR Polling (Alternative to Webhooks)
|
|
|
|
When `PR_POLL_ENABLED=True`, the agent periodically scans all `WATCHED_REPOS` for
|
|
active PRs targeting the configured branch. New PRs not yet tracked in `agent_threads`
|
|
are automatically processed through the `pr_completed` graph.
|
|
|
|
This eliminates the need for Azure DevOps webhook configuration and works behind
|
|
firewalls without public endpoint exposure.
|
|
|
|
## Auto-Create Jira Ticket
|
|
|
|
When a PR branch has no ticket ID (e.g., `chore/update-dependencies` instead of
|
|
`feature/ALLPOST-4028_login-page`), the agent automatically:
|
|
|
|
1. Sends the PR diff to Claude Code CLI
|
|
2. Claude generates a concise ticket summary and description
|
|
3. Creates a Jira Story in the `DEFAULT_JIRA_PROJECT`
|
|
4. Continues the normal workflow with the created ticket
|
|
|
|
## Database Schema
|
|
|
|
Tables are created automatically on startup:
|
|
|
|
```sql
|
|
-- Thread tracking for LangGraph interrupts and PR dedup
|
|
agent_threads (thread_id, graph_name, repo_name, pr_id, status, state JSONB,
|
|
slack_message_ts, created_at, updated_at)
|
|
|
|
-- Current in-progress releases (one per repo)
|
|
staging_releases (repo, version, started_at, tickets JSONB, updated_at)
|
|
|
|
-- Completed releases (immutable history)
|
|
archived_releases (repo, version, started_at, tickets JSONB, released_at)
|
|
```
|
|
|
|
## Migrating Existing JSON Files
|
|
|
|
If you have existing release data from the Claude Code skill:
|
|
|
|
```bash
|
|
# Dry run
|
|
uv run python scripts/migrate_json_to_db.py \
|
|
--source ../release-workflow/releases --dry-run
|
|
|
|
# Execute
|
|
uv run python scripts/migrate_json_to_db.py \
|
|
--source ../release-workflow/releases \
|
|
--dsn "postgresql://agent:password@localhost/agent"
|
|
```
|
|
|
|
## Development
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# Run all tests with coverage (1061 tests, 96%+ coverage)
|
|
uv run pytest
|
|
|
|
# Run without coverage (faster)
|
|
uv run pytest --no-cov
|
|
|
|
# Run specific module
|
|
uv run pytest tests/graph/test_pr_completed.py -v
|
|
```
|
|
|
|
### Project Structure
|
|
|
|
```
|
|
src/release_agent/
|
|
main.py # FastAPI app, lifespan, task management
|
|
config.py # pydantic-settings (all env vars)
|
|
state.py # LangGraph ReleaseState TypedDict
|
|
exceptions.py # Exception hierarchy
|
|
branch_parser.py # Extract ticket ID from branch name
|
|
versioning.py # Per-repo version calculation
|
|
api/
|
|
models.py # HTTP request/response Pydantic models
|
|
dependencies.py # FastAPI Depends() + operator auth
|
|
webhooks.py # POST /webhooks/azdo
|
|
approvals.py # Approval endpoints
|
|
status.py # Status, releases, manual triggers
|
|
slack_interactions.py # POST /slack/interactions (button callbacks)
|
|
graph/
|
|
dependencies.py # ToolClients, StagingStore Protocol
|
|
postgres_staging_store.py # PostgreSQL-backed store
|
|
routing.py # Pure routing functions (route_after_fetch, etc.)
|
|
pr_completed.py # PR graph nodes + auto_create_ticket
|
|
release.py # Release graph nodes + CI/CD approval loop
|
|
full_cycle.py # Subgraph composition
|
|
ci_nodes.py # CI trigger, poll, notify nodes
|
|
polling.py # Reusable async poll_until utility
|
|
models/
|
|
pr.py, ticket.py, release.py, pipeline.py
|
|
webhook.py, review.py, jira.py, build.py
|
|
tools/
|
|
azdo.py # Azure DevOps REST client
|
|
jira.py # Jira REST client (transitions + create_issue)
|
|
slack.py # Slack dual-mode (webhook + Web API)
|
|
claude_review.py # Claude Code CLI (review + ticket generation)
|
|
git_local.py # Local git ops (fetch, checkout PR branch, diff)
|
|
_http.py, _retry.py # Shared helpers
|
|
services/
|
|
pr_poller.py # Background PR polling loop
|
|
pr_dedup.py # PR deduplication via agent_threads
|
|
scripts/
|
|
migrate_json_to_db.py # One-time JSON -> PostgreSQL migration
|
|
tests/ # 1061 tests, 96%+ coverage
|
|
```
|
|
|
|
## Docker
|
|
|
|
```bash
|
|
# Required: set POSTGRES_PASSWORD and WEBHOOK_SECRET in .env
|
|
docker compose up -d
|
|
```
|
|
|
|
The agent service includes a health check at `/status`. PostgreSQL uses
|
|
`pg_isready` with `service_healthy` dependency.
|
|
|
|
## Running Locally (WSL Recommended)
|
|
|
|
The app runs best on **WSL (Ubuntu)** because:
|
|
- `psycopg` async requires `SelectorEventLoop` (incompatible with Windows `ProactorEventLoop`)
|
|
- `subprocess.run` captures Claude CLI stdout correctly on Linux but not reliably on Windows
|
|
- PostgreSQL runs in Docker (accessible from both Windows and WSL via `localhost`)
|
|
|
|
### Setup
|
|
|
|
```bash
|
|
# 1. Clone the project to WSL native filesystem (NOT /mnt/c/)
|
|
cd ~/git/billo
|
|
git clone <repo-url> billo-release-agent
|
|
cd billo-release-agent
|
|
|
|
# 2. Start PostgreSQL (from Windows or WSL)
|
|
docker compose up -d db
|
|
|
|
# 3. Install uv in WSL (if needed)
|
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
export PATH="$HOME/.local/bin:$PATH"
|
|
|
|
# 4. Create venv and install dependencies
|
|
# IMPORTANT: Run from the WSL-native path, not /mnt/c/.
|
|
# If .venv was created from /mnt/c/, delete and recreate:
|
|
# rm -rf .venv && uv venv --python python3.12
|
|
uv sync --all-extras
|
|
|
|
# 5. Configure .env
|
|
# Key settings:
|
|
# CLAUDE_CMD=claude
|
|
# REPOS_BASE_DIR=~/git/billo (WSL native path, not /mnt/c/)
|
|
# PR_POLL_ENABLED=False (disable during dev to avoid noise)
|
|
# SLACK_WEBHOOK_URL= (leave empty during dev)
|
|
|
|
# 6. Start the server
|
|
uv run uvicorn release_agent.main:app --host 0.0.0.0 --port 8080
|
|
|
|
# 7. Test
|
|
curl http://localhost:8080/status
|
|
curl -X POST http://localhost:8080/manual/pr/10443
|
|
```
|
|
|
|
### Important: venv Must Be Created from WSL Path
|
|
|
|
If the `.venv` was created while the working directory was `/mnt/c/...`, the
|
|
uvicorn shebang will point to the Windows-mounted path. This causes the server
|
|
to load stale code from the Windows filesystem instead of your WSL edits.
|
|
|
|
To fix: `rm -rf .venv && uv venv --python python3.12 && uv sync --all-extras`
|
|
|
|
### Windows-only (Fallback)
|
|
|
|
If you must run on Windows directly, use the provided `run.py` script which sets
|
|
`WindowsSelectorEventLoopPolicy` before starting uvicorn:
|
|
|
|
```bash
|
|
uv run python run.py
|
|
```
|
|
|
|
Note: Claude CLI subprocess may return empty stdout on Windows due to event loop
|
|
incompatibility. WSL is the recommended approach.
|
|
|
|
### Local Git Checkout for Code Review
|
|
|
|
When `REPOS_BASE_DIR` is set and the repo is cloned locally, the agent will:
|
|
|
|
1. `git fetch origin` to get the latest remote state
|
|
2. `git checkout` the PR source branch
|
|
3. `git diff origin/develop...HEAD` to generate a real diff (not just file names)
|
|
4. Run Claude Code CLI with `cwd` set to the repo on the PR branch
|
|
5. `git checkout develop` to restore the branch after review
|
|
|
|
This gives Claude full codebase context on the actual PR branch, producing much
|
|
more thorough reviews than the AzDo iteration API (which only returns file paths).
|
|
|
|
**Performance**: Clone repos to WSL native filesystem (`~/git/billo/`), not
|
|
`/mnt/c/`. Claude Code CLI with `--allowedTools Read,Glob,Grep` on `/mnt/c`
|
|
is very slow (10+ minutes per review vs seconds on native fs).
|
|
|
|
## Slack App Setup
|
|
|
|
To use interactive buttons (optional — REST API approvals still work without it):
|
|
|
|
1. Create a Slack App at https://api.slack.com/apps
|
|
2. Enable **Interactivity** with Request URL: `https://<your-domain>/slack/interactions`
|
|
3. Add Bot Token Scopes: `chat:write`, `chat:update`
|
|
4. Install to workspace, get Bot Token (`xoxb-...`)
|
|
5. Set `SLACK_BOT_TOKEN`, `SLACK_SIGNING_SECRET`, `SLACK_CHANNEL_ID` in `.env`
|
|
|
|
## Current Status
|
|
|
|
### Working
|
|
|
|
- App startup, health check, API endpoints
|
|
- Azure DevOps API integration (get PR, list active PRs, get iterations/changes)
|
|
- PR info parsing (repo_name, ticket_id, branch extraction)
|
|
- Graph execution (full pr_completed flow: parse -> fetch -> route -> review -> notify)
|
|
- Database read/write (agent_threads table)
|
|
- Slack error handling (empty webhook URL gracefully skipped)
|
|
- Claude CLI ticket generation (tested: returns structured JSON)
|
|
- Claude CLI code review (tested: returns structured JSON with verdict + issues)
|
|
- PR review comments posted to Azure DevOps (inline + summary)
|
|
- Local git checkout + real `git diff` for PR code review (with fallback to AzDo API)
|
|
- Branch restore to develop after review completes
|
|
- Node type annotations fixed (`RunnableConfig` instead of `dict`)
|
|
|
|
### Known Issues
|
|
|
|
| Issue | Severity | Workaround |
|
|
|-------|----------|------------|
|
|
| Windows: Claude CLI subprocess returns empty stdout | HIGH | Run in WSL |
|
|
| WSL: venv created from /mnt/c/ loads stale code | HIGH | Delete .venv, recreate from WSL native path |
|
|
| WSL + /mnt/c: Claude CLI Read/Glob very slow (10+ min) | MEDIUM | Clone repos to WSL native fs |
|
|
| Graph has no LangGraph checkpointer (interrupt not persistent) | MEDIUM | Graphs run to completion or fail; no resume |
|
|
| `_upsert_thread` only writes final state (no intermediate updates) | LOW | Query DB only after graph completes |
|
|
| CI poll may run indefinitely (no build to poll in dev) | LOW | Leave `PR_POLL_ENABLED=False` |
|
|
| Config test failures (env var leakage from .env) | LOW | Run with `-k "not test_config"` |
|
|
| Local git checkout mutates repo working tree | LOW | Review runs sequentially; branch restored after |
|
|
|
|
### TODO (Not Yet Implemented)
|
|
|
|
- [ ] Wire LangGraph checkpointer (PostgreSQL) for interrupt persistence
|
|
- [ ] Interrupt decision validation (currently any resume value proceeds)
|
|
- [ ] Slack interactive buttons end-to-end (Slack App not yet created)
|
|
- [ ] CI/CD pipeline trigger end-to-end testing
|
|
- [ ] Release approval gate detection (check_release_approvals is a stub)
|
|
- [ ] `last_merge_source_commit` from AzDo API for safe merge
|
|
- [ ] Operator token auth testing in production
|
|
- [ ] Multi-stage Dockerfile for smaller images
|
|
- [ ] Centralize `_upsert_thread` into shared `api/db.py` module
|
|
- [ ] Remove dead `has_ticket` routing function
|
|
- [ ] PR poller dedup query correctness (unnest pair-wise matching untested against real DB)
|
|
- [ ] `archive_release` date injection (replace `date.today()` with config)
|
|
- [ ] Approval loop max iteration guard (prevent infinite loops)
|
|
- [ ] Migrate existing release JSON data to PostgreSQL
|