- Fix RunnableConfig type annotations (dict -> RunnableConfig) for LangGraph compat - Fix AzDo PR URL parsing (_links.web.href fallback + remoteUrl construction) - Fix AzDo diff endpoint (use iterations/changes instead of non-existent diffs API) - Fix _format_diff to read changeEntries field (not changes) - Fix URL encoding for project names with spaces (Billo App Platform) - Fix subprocess.run for Windows (replace asyncio.create_subprocess_exec with thread pool) - Fix SlackClient to handle empty webhook URL gracefully - Fix notify_request_changes to catch all exceptions (not just ReleaseAgentError) - Fix JSON parsing to strip whitespace before json.loads - Add CLAUDE_CMD config field for cross-platform CLI path - Add run.py for Windows SelectorEventLoop workaround - Add db port mapping in docker-compose for local dev - Add comprehensive README sections: WSL setup, known issues, TODO list
18 KiB
Billo Release Agent
A LangGraph-based release automation agent for Billo. Automates the full release pipeline: PR discovery, code review (via Claude Code CLI), Jira ticket management, staging release tracking, CI/CD pipeline triggering with approval gates, and Slack interactive notifications.
Architecture
+--- Azure DevOps Webhook ---+ +--- PR Poller (every 5 min) ---+
| POST /webhooks/azdo | | Scans WATCHED_REPOS for |
| (push-based) | | active PRs (pull-based) |
+------------+---------------+ +------------+------------------+
| |
v v
+---------------------------------------------+
| FastAPI Application |
| /webhooks/azdo /slack/interactions |
| /approvals/* /manual/* /status |
+---------------------+------------------------+
|
v
+---------------------------------------------+
| LangGraph Graphs |
| |
| pr_completed: |
| parse -> fetch -> [has ticket?] |
| no -> Claude generates ticket |
| yes -> Claude code review |
| -> merge -> Jira -> staging -> CI build |
| |
| release: |
| create release PR -> merge -> CI build |
| -> CD release -> [Sandbox approve] |
| -> [Production approve] -> Slack notify |
+---------------------+------------------------+
|
+----------------+----------------+
| | |
v v v
PostgreSQL Azure DevOps Slack (buttons)
- threads - PRs/Pipelines - Notifications
- staging - Builds/Releases - Approvals
- releases
Key Features
| Feature | Description |
|---|---|
| PR Discovery | Webhook-based (push) or polling-based (pull) — or both |
| Auto-Create Jira Ticket | When PR branch has no ticket ID, Claude generates summary + description and creates a Jira Story |
| AI Code Review | Claude Code CLI reviews PRs with full repo context (Read/Glob/Grep), using your subscription |
| CI/CD Integration | Triggers CI builds after merge, polls for completion, handles CD release approval gates |
| Slack Interactive | Approval requests with [Approve]/[Cancel] buttons, CI/CD status notifications |
| Human-in-the-loop | 5 interrupt points where operator confirmation is required before destructive actions |
| Per-repo Versioning | Independent semantic versioning per repository (patch auto-increment) |
Prerequisites
- Python 3.12+
- PostgreSQL 16+
- uv (recommended) or pip
- Claude Code CLI installed and authenticated (
claudein PATH) - Slack App (for interactive buttons) or Slack Incoming Webhook (for notifications only)
Quick Start
Local Development
# Install dependencies
uv sync --all-extras
# Copy and configure environment
cp .env.example .env
# Edit .env -- fill in all REQUIRED variables
# Start PostgreSQL
docker compose up -d db
# Run the server
uv run uvicorn release_agent.main:app --reload --port 8000
# Verify
curl http://localhost:8000/status
Docker Compose (Production)
cp .env.example .env
# Edit .env -- POSTGRES_PASSWORD, WEBHOOK_SECRET, etc. are required
docker compose up -d
Tables are created automatically on first startup.
Configuration
All configuration is via environment variables. See .env.example for the full list.
Required Variables
| Variable | Description |
|---|---|
AZDO_ORGANIZATION |
Azure DevOps organization name |
AZDO_PROJECT |
Azure DevOps project name |
AZDO_PAT |
Azure DevOps personal access token |
POSTGRES_DSN |
PostgreSQL connection string |
POSTGRES_PASSWORD |
PostgreSQL password (used by docker-compose) |
JIRA_EMAIL |
Jira account email |
JIRA_API_TOKEN |
Jira API token |
SLACK_WEBHOOK_URL |
Slack incoming webhook URL |
WEBHOOK_SECRET |
Shared secret for validating AzDo webhooks (must be non-empty) |
Optional Variables
| Variable | Default | Description |
|---|---|---|
REPOS_BASE_DIR |
"" |
Base dir with Billo repos (e.g., /c/Users/yaoji/git/Billo) |
WATCHED_REPOS |
"" |
Comma-separated repos to poll (e.g., Billo.Platform.Payment,Billo.Platform.Document.DocumentAnalyser) |
PR_POLL_ENABLED |
False |
Enable periodic PR polling |
PR_POLL_INTERVAL_SECONDS |
300 |
Polling interval (5 min) |
PR_POLL_TARGET_BRANCH |
refs/heads/develop |
Target branch filter |
DEFAULT_JIRA_PROJECT |
ALLPOST |
Jira project key for auto-created tickets |
AUTO_CREATE_TICKET_ENABLED |
True |
Auto-create Jira ticket when branch has no ticket ID |
SLACK_BOT_TOKEN |
"" |
Slack App bot token (for interactive buttons) |
SLACK_SIGNING_SECRET |
"" |
Slack App signing secret (required for /slack/interactions) |
SLACK_CHANNEL_ID |
"" |
Channel for interactive messages |
CI_POLL_INTERVAL_SECONDS |
30 |
CI build status poll interval |
CI_POLL_MAX_WAIT_SECONDS |
1800 |
Max wait for CI completion (30 min) |
OPERATOR_TOKEN |
"" |
Token for operator endpoints (empty = no auth) |
JIRA_BASE_URL |
https://billolife.atlassian.net |
Jira instance URL |
PORT |
8000 |
HTTP server port |
Security Notes
WEBHOOK_SECRETmust be non-empty; empty secret rejects all webhooksPOSTGRES_PASSWORDhas no default in docker-compose; fails if unsetSLACK_SIGNING_SECRETmust be set for/slack/interactionsto accept requests (returns 503 if empty)- Slack signature verification includes 5-minute replay attack prevention
- All secrets use
SecretStrand are never logged or included in error responses - Set
OPERATOR_TOKENin production to protect approval and manual trigger endpoints
API Endpoints
Webhooks
| Method | Path | Auth | Description |
|---|---|---|---|
| POST | /webhooks/azdo |
X-Webhook-Secret |
Receive Azure DevOps PR webhook events |
| POST | /slack/interactions |
Slack Signing Secret | Receive Slack button click callbacks |
Approvals (requires X-Operator-Token when configured)
| Method | Path | Description |
|---|---|---|
| GET | /approvals/pending |
List threads awaiting operator approval |
| POST | /approvals/{thread_id} |
Submit approval decision (merge/cancel/approve/skip) |
Status and Manual Triggers
| Method | Path | Auth | Description |
|---|---|---|---|
| GET | /status |
None | Health check |
| GET | /releases/{repo} |
None | List versions for a repo |
| GET | /staging?repo={repo} |
None | Current staging release |
| POST | /manual/pr/{pr_id} |
X-Operator-Token |
Manually trigger PR processing |
| POST | /manual/release |
X-Operator-Token |
Manually trigger a release |
Graph Workflows
PR Completed
parse_webhook -> fetch_pr_details -> route_after_fetch
|-- merged -----------------> calculate_version -> update_staging -> CI build -> END
|-- active_with_ticket -----> move_jira_code_review -+
|-- active_no_ticket -------> auto_create_ticket ----+
|
run_code_review -> evaluate_review
|-- approve -> [Slack: Merge?] -> merge_pr
|-- request_changes -> notify -> END
-> Jira transitions -> calculate_version
-> update_staging -> CI build -> notify -> END
Release
load_staging -> [Slack: Create release?] -> create_release_pr
-> [Slack: Merge release?] -> merge_release_pr
-> CI build on main -> poll until complete
|-- ci_failed -> notify failure -> END
|-- ci_passed -> wait for CD release -> approval loop:
|-- [Slack: Approve Sandbox?] -> approve -> poll again
|-- [Slack: Approve Production?] -> approve -> poll again
|-- all_deployed -> move tickets to Done
-> Slack release notification -> archive -> END
Interrupt Points (Slack buttons)
| # | When | Slack Message | Buttons |
|---|---|---|---|
| 1 | After code review approves | PR title + review summary | [Merge] [Cancel] |
| 2 | Before creating release PR | Version + ticket list | [Create] [Cancel] |
| 3 | Before merging release PR | Release PR link | [Merge] [Cancel] |
| 4 | Before triggering pipelines | Pipeline list | [Trigger] [Skip] |
| 5 | Before approving release stage | Stage name + status | [Approve] [Skip] |
PR Polling (Alternative to Webhooks)
When PR_POLL_ENABLED=True, the agent periodically scans all WATCHED_REPOS for
active PRs targeting the configured branch. New PRs not yet tracked in agent_threads
are automatically processed through the pr_completed graph.
This eliminates the need for Azure DevOps webhook configuration and works behind firewalls without public endpoint exposure.
Auto-Create Jira Ticket
When a PR branch has no ticket ID (e.g., chore/update-dependencies instead of
feature/ALLPOST-4028_login-page), the agent automatically:
- Sends the PR diff to Claude Code CLI
- Claude generates a concise ticket summary and description
- Creates a Jira Story in the
DEFAULT_JIRA_PROJECT - Continues the normal workflow with the created ticket
Database Schema
Tables are created automatically on startup:
-- Thread tracking for LangGraph interrupts and PR dedup
agent_threads (thread_id, graph_name, repo_name, pr_id, status, state JSONB,
slack_message_ts, created_at, updated_at)
-- Current in-progress releases (one per repo)
staging_releases (repo, version, started_at, tickets JSONB, updated_at)
-- Completed releases (immutable history)
archived_releases (repo, version, started_at, tickets JSONB, released_at)
Migrating Existing JSON Files
If you have existing release data from the Claude Code skill:
# Dry run
uv run python scripts/migrate_json_to_db.py \
--source ../release-workflow/releases --dry-run
# Execute
uv run python scripts/migrate_json_to_db.py \
--source ../release-workflow/releases \
--dsn "postgresql://agent:password@localhost/agent"
Development
Running Tests
# Run all tests with coverage (1061 tests, 96%+ coverage)
uv run pytest
# Run without coverage (faster)
uv run pytest --no-cov
# Run specific module
uv run pytest tests/graph/test_pr_completed.py -v
Project Structure
src/release_agent/
main.py # FastAPI app, lifespan, task management
config.py # pydantic-settings (all env vars)
state.py # LangGraph ReleaseState TypedDict
exceptions.py # Exception hierarchy
branch_parser.py # Extract ticket ID from branch name
versioning.py # Per-repo version calculation
api/
models.py # HTTP request/response Pydantic models
dependencies.py # FastAPI Depends() + operator auth
webhooks.py # POST /webhooks/azdo
approvals.py # Approval endpoints
status.py # Status, releases, manual triggers
slack_interactions.py # POST /slack/interactions (button callbacks)
graph/
dependencies.py # ToolClients, StagingStore Protocol
postgres_staging_store.py # PostgreSQL-backed store
routing.py # Pure routing functions (route_after_fetch, etc.)
pr_completed.py # PR graph nodes + auto_create_ticket
release.py # Release graph nodes + CI/CD approval loop
full_cycle.py # Subgraph composition
ci_nodes.py # CI trigger, poll, notify nodes
polling.py # Reusable async poll_until utility
models/
pr.py, ticket.py, release.py, pipeline.py
webhook.py, review.py, jira.py, build.py
tools/
azdo.py # Azure DevOps REST client
jira.py # Jira REST client (transitions + create_issue)
slack.py # Slack dual-mode (webhook + Web API)
claude_review.py # Claude Code CLI (review + ticket generation)
_http.py, _retry.py # Shared helpers
services/
pr_poller.py # Background PR polling loop
pr_dedup.py # PR deduplication via agent_threads
scripts/
migrate_json_to_db.py # One-time JSON -> PostgreSQL migration
tests/ # 1061 tests, 96%+ coverage
Docker
# Required: set POSTGRES_PASSWORD and WEBHOOK_SECRET in .env
docker compose up -d
The agent service includes a health check at /status. PostgreSQL uses
pg_isready with service_healthy dependency.
Running Locally (WSL Recommended)
The app runs best on WSL (Ubuntu) because:
psycopgasync requiresSelectorEventLoop(incompatible with WindowsProactorEventLoop)subprocess.runcaptures Claude CLI stdout correctly on Linux but not reliably on Windows- PostgreSQL runs in Docker (accessible from both Windows and WSL via
localhost)
Setup
# 1. Start PostgreSQL (from Windows or WSL)
cd /mnt/c/Users/yaoji/git/Billo/billo-release-agent
docker compose up -d db
# 2. Install uv in WSL (if needed)
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
# 3. Install dependencies
uv sync --all-extras
# 4. Configure .env
# Key settings:
# CLAUDE_CMD=claude (not claude.cmd — WSL finds it via PATH)
# REPOS_BASE_DIR=/mnt/c/Users/yaoji/git/Billo
# PR_POLL_ENABLED=False (disable during dev to avoid noise)
# SLACK_WEBHOOK_URL= (leave empty during dev)
# 5. Start the server
uv run uvicorn release_agent.main:app --host 0.0.0.0 --port 8080
# 6. Test
curl http://localhost:8080/status
curl -X POST http://localhost:8080/manual/pr/10443
Windows-only (Fallback)
If you must run on Windows directly, use the provided run.py script which sets
WindowsSelectorEventLoopPolicy before starting uvicorn:
uv run python run.py
Note: Claude CLI subprocess may return empty stdout on Windows due to event loop incompatibility. WSL is the recommended approach.
Performance Note: WSL + /mnt/c
Claude Code CLI with --allowedTools Read,Glob,Grep on /mnt/c (Windows filesystem
mounted in WSL) is very slow. For faster code reviews, either:
- Clone repos to WSL native filesystem (
~/git/Billo/) and setREPOS_BASE_DIR=~/git/Billo - Remove
--allowedToolsso Claude only reviews the diff text (faster but less thorough)
Slack App Setup
To use interactive buttons (optional — REST API approvals still work without it):
- Create a Slack App at https://api.slack.com/apps
- Enable Interactivity with Request URL:
https://<your-domain>/slack/interactions - Add Bot Token Scopes:
chat:write,chat:update - Install to workspace, get Bot Token (
xoxb-...) - Set
SLACK_BOT_TOKEN,SLACK_SIGNING_SECRET,SLACK_CHANNEL_IDin.env
Current Status
Working
- App startup, health check, API endpoints
- Azure DevOps API integration (get PR, list active PRs, get iterations/changes)
- PR info parsing (repo_name, ticket_id, branch extraction)
- Graph execution (full pr_completed flow: parse -> fetch -> route -> review -> notify)
- Database read/write (agent_threads table)
- Slack error handling (empty webhook URL gracefully skipped)
- Claude CLI ticket generation (tested: returns structured JSON)
- Claude CLI code review (tested: returns structured JSON with verdict + issues)
- PR review comments posted to Azure DevOps (inline + summary)
- Node type annotations fixed (
RunnableConfiginstead ofdict)
Known Issues
| Issue | Severity | Workaround |
|---|---|---|
| Windows: Claude CLI subprocess returns empty stdout | HIGH | Run in WSL |
| WSL + /mnt/c: Claude CLI Read/Glob very slow (10+ min) | MEDIUM | Clone repos to WSL native fs |
| Graph has no LangGraph checkpointer (interrupt not persistent) | MEDIUM | Graphs run to completion or fail; no resume |
_upsert_thread only writes final state (no intermediate updates) |
LOW | Query DB only after graph completes |
| CI poll may run indefinitely (no build to poll in dev) | LOW | Leave PR_POLL_ENABLED=False |
| Config test failures (env var leakage from .env) | LOW | Run with -k "not test_config" |
TODO (Not Yet Implemented)
- Wire LangGraph checkpointer (PostgreSQL) for interrupt persistence
- Interrupt decision validation (currently any resume value proceeds)
- Slack interactive buttons end-to-end (Slack App not yet created)
- CI/CD pipeline trigger end-to-end testing
- Release approval gate detection (check_release_approvals is a stub)
last_merge_source_commitfrom AzDo API for safe merge- Operator token auth testing in production
- Multi-stage Dockerfile for smaller images
- Centralize
_upsert_threadinto sharedapi/db.pymodule - Remove dead
has_ticketrouting function - PR poller dedup query correctness (unnest pair-wise matching untested against real DB)
archive_releasedate injection (replacedate.today()with config)- Approval loop max iteration guard (prevent infinite loops)
- Migrate existing release JSON data to PostgreSQL