# Billo Release Agent A LangGraph-based release automation agent for Billo. Automates the full release pipeline: PR discovery, code review (via Claude Code CLI), Jira ticket management, staging release tracking, CI/CD pipeline triggering with approval gates, and Slack interactive notifications. ## Architecture ``` +--- Azure DevOps Webhook ---+ +--- PR Poller (every 5 min) ---+ | POST /webhooks/azdo | | Scans WATCHED_REPOS for | | (push-based) | | active PRs (pull-based) | +------------+---------------+ +------------+------------------+ | | v v +---------------------------------------------+ | FastAPI Application | | /webhooks/azdo /slack/interactions | | /approvals/* /manual/* /status | +---------------------+------------------------+ | v +---------------------------------------------+ | LangGraph Graphs | | | | pr_completed: | | parse -> fetch -> [has ticket?] | | no -> Claude generates ticket | | yes -> Claude code review | | -> merge -> Jira -> staging -> CI build | | | | release: | | create release PR -> merge -> CI build | | -> CD release -> [Sandbox approve] | | -> [Production approve] -> Slack notify | +---------------------+------------------------+ | +----------------+----------------+ | | | v v v PostgreSQL Azure DevOps Slack (buttons) - threads - PRs/Pipelines - Notifications - staging - Builds/Releases - Approvals - releases ``` ## Key Features | Feature | Description | |---------|-------------| | **PR Discovery** | Webhook-based (push) or polling-based (pull) — or both | | **Auto-Create Jira Ticket** | When PR branch has no ticket ID, Claude generates summary + description and creates a Jira Story | | **AI Code Review** | Claude Code CLI reviews PRs on the checked-out PR branch with full repo context (Read/Glob/Grep), using real `git diff` | | **CI/CD Integration** | Triggers CI builds after merge, polls for completion, handles CD release approval gates | | **Slack Interactive** | Approval requests with [Approve]/[Cancel] buttons, CI/CD status notifications | | **Human-in-the-loop** | 5 interrupt points where operator confirmation is required before destructive actions | | **Per-repo Versioning** | Independent semantic versioning per repository (patch auto-increment) | ## Prerequisites - Python 3.12+ - PostgreSQL 16+ - [uv](https://github.com/astral-sh/uv) (recommended) or pip - Claude Code CLI installed and authenticated (`claude` in PATH) - Slack App (for interactive buttons) or Slack Incoming Webhook (for notifications only) ## Quick Start ### Local Development ```bash # Install dependencies uv sync --all-extras # Copy and configure environment cp .env.example .env # Edit .env -- fill in all REQUIRED variables # Start PostgreSQL docker compose up -d db # Run the server uv run uvicorn release_agent.main:app --reload --port 8000 # Verify curl http://localhost:8000/status ``` ### Docker Compose (Production) ```bash cp .env.example .env # Edit .env -- POSTGRES_PASSWORD, WEBHOOK_SECRET, etc. are required docker compose up -d ``` Tables are created automatically on first startup. ## Configuration All configuration is via environment variables. See `.env.example` for the full list. ### Required Variables | Variable | Description | |----------|-------------| | `AZDO_ORGANIZATION` | Azure DevOps organization name | | `AZDO_PROJECT` | Azure DevOps project name | | `AZDO_PAT` | Azure DevOps personal access token | | `POSTGRES_DSN` | PostgreSQL connection string | | `POSTGRES_PASSWORD` | PostgreSQL password (used by docker-compose) | | `JIRA_EMAIL` | Jira account email | | `JIRA_API_TOKEN` | Jira API token | | `SLACK_WEBHOOK_URL` | Slack incoming webhook URL | | `WEBHOOK_SECRET` | Shared secret for validating AzDo webhooks (must be non-empty) | ### Optional Variables | Variable | Default | Description | |----------|---------|-------------| | `REPOS_BASE_DIR` | `""` | Base dir with Billo repos (e.g., `/home/kai/git/billo`). Each repo must be cloned with its AzDo name as the directory name. | | `WATCHED_REPOS` | `""` | Comma-separated repos to poll (e.g., `Billo.Platform.Payment,Billo.Platform.Document.DocumentAnalyser`) | | `PR_POLL_ENABLED` | `False` | Enable periodic PR polling | | `PR_POLL_INTERVAL_SECONDS` | `300` | Polling interval (5 min) | | `PR_POLL_TARGET_BRANCH` | `refs/heads/develop` | Target branch filter | | `DEFAULT_JIRA_PROJECT` | `ALLPOST` | Jira project key for auto-created tickets | | `AUTO_CREATE_TICKET_ENABLED` | `True` | Auto-create Jira ticket when branch has no ticket ID | | `SLACK_BOT_TOKEN` | `""` | Slack App bot token (for interactive buttons) | | `SLACK_SIGNING_SECRET` | `""` | Slack App signing secret (required for /slack/interactions) | | `SLACK_CHANNEL_ID` | `""` | Channel for interactive messages | | `CI_POLL_INTERVAL_SECONDS` | `30` | CI build status poll interval | | `CI_POLL_MAX_WAIT_SECONDS` | `1800` | Max wait for CI completion (30 min) | | `OPERATOR_TOKEN` | `""` | Token for operator endpoints (empty = no auth) | | `JIRA_BASE_URL` | `https://billolife.atlassian.net` | Jira instance URL | | `PORT` | `8000` | HTTP server port | ### Security Notes - `WEBHOOK_SECRET` must be non-empty; empty secret rejects all webhooks - `POSTGRES_PASSWORD` has no default in docker-compose; fails if unset - `SLACK_SIGNING_SECRET` must be set for `/slack/interactions` to accept requests (returns 503 if empty) - Slack signature verification includes 5-minute replay attack prevention - All secrets use `SecretStr` and are never logged or included in error responses - Set `OPERATOR_TOKEN` in production to protect approval and manual trigger endpoints ## API Endpoints ### Webhooks | Method | Path | Auth | Description | |--------|------|------|-------------| | POST | `/webhooks/azdo` | `X-Webhook-Secret` | Receive Azure DevOps PR webhook events | | POST | `/slack/interactions` | Slack Signing Secret | Receive Slack button click callbacks | ### Approvals (requires `X-Operator-Token` when configured) | Method | Path | Description | |--------|------|-------------| | GET | `/approvals/pending` | List threads awaiting operator approval | | POST | `/approvals/{thread_id}` | Submit approval decision (merge/cancel/approve/skip) | ### Status and Manual Triggers | Method | Path | Auth | Description | |--------|------|------|-------------| | GET | `/status` | None | Health check | | GET | `/releases/{repo}` | None | List versions for a repo | | GET | `/staging?repo={repo}` | None | Current staging release | | POST | `/manual/pr/{pr_id}` | `X-Operator-Token` | Manually trigger PR processing | | POST | `/manual/release` | `X-Operator-Token` | Manually trigger a release | ## Graph Workflows ### PR Completed ``` parse_webhook -> fetch_pr_details -> route_after_fetch | | | +-- (local repo available?) | yes -> git fetch + checkout PR branch + git diff | no -> AzDo iteration API (file list only) | |-- merged -----------------> calculate_version -> update_staging -> CI build -> END |-- active_with_ticket -----> move_jira_code_review -+ |-- active_no_ticket -------> auto_create_ticket ----+ | run_code_review -> evaluate_review |-- approve -> [Slack: Merge?] -> merge_pr |-- request_changes -> notify -> END -> restore branch to develop -> Jira transitions -> calculate_version -> update_staging -> CI build -> notify -> END ``` ### Release ``` load_staging -> [Slack: Create release?] -> create_release_pr -> [Slack: Merge release?] -> merge_release_pr -> CI build on main -> poll until complete |-- ci_failed -> notify failure -> END |-- ci_passed -> wait for CD release -> approval loop: |-- [Slack: Approve Sandbox?] -> approve -> poll again |-- [Slack: Approve Production?] -> approve -> poll again |-- all_deployed -> move tickets to Done -> Slack release notification -> archive -> END ``` ### Interrupt Points (Slack buttons) | # | When | Slack Message | Buttons | |---|------|--------------|---------| | 1 | After code review approves | PR title + review summary | [Merge] [Cancel] | | 2 | Before creating release PR | Version + ticket list | [Create] [Cancel] | | 3 | Before merging release PR | Release PR link | [Merge] [Cancel] | | 4 | Before triggering pipelines | Pipeline list | [Trigger] [Skip] | | 5 | Before approving release stage | Stage name + status | [Approve] [Skip] | ## PR Polling (Alternative to Webhooks) When `PR_POLL_ENABLED=True`, the agent periodically scans all `WATCHED_REPOS` for active PRs targeting the configured branch. New PRs not yet tracked in `agent_threads` are automatically processed through the `pr_completed` graph. This eliminates the need for Azure DevOps webhook configuration and works behind firewalls without public endpoint exposure. ## Auto-Create Jira Ticket When a PR branch has no ticket ID (e.g., `chore/update-dependencies` instead of `feature/ALLPOST-4028_login-page`), the agent automatically: 1. Sends the PR diff to Claude Code CLI 2. Claude generates a concise ticket summary and description 3. Creates a Jira Story in the `DEFAULT_JIRA_PROJECT` 4. Continues the normal workflow with the created ticket ## Database Schema Tables are created automatically on startup: ```sql -- Thread tracking for LangGraph interrupts and PR dedup agent_threads (thread_id, graph_name, repo_name, pr_id, status, state JSONB, slack_message_ts, created_at, updated_at) -- Current in-progress releases (one per repo) staging_releases (repo, version, started_at, tickets JSONB, updated_at) -- Completed releases (immutable history) archived_releases (repo, version, started_at, tickets JSONB, released_at) ``` ## Migrating Existing JSON Files If you have existing release data from the Claude Code skill: ```bash # Dry run uv run python scripts/migrate_json_to_db.py \ --source ../release-workflow/releases --dry-run # Execute uv run python scripts/migrate_json_to_db.py \ --source ../release-workflow/releases \ --dsn "postgresql://agent:password@localhost/agent" ``` ## Development ### Running Tests ```bash # Run all tests with coverage (1061 tests, 96%+ coverage) uv run pytest # Run without coverage (faster) uv run pytest --no-cov # Run specific module uv run pytest tests/graph/test_pr_completed.py -v ``` ### Project Structure ``` src/release_agent/ main.py # FastAPI app, lifespan, task management config.py # pydantic-settings (all env vars) state.py # LangGraph ReleaseState TypedDict exceptions.py # Exception hierarchy branch_parser.py # Extract ticket ID from branch name versioning.py # Per-repo version calculation api/ models.py # HTTP request/response Pydantic models dependencies.py # FastAPI Depends() + operator auth webhooks.py # POST /webhooks/azdo approvals.py # Approval endpoints status.py # Status, releases, manual triggers slack_interactions.py # POST /slack/interactions (button callbacks) graph/ dependencies.py # ToolClients, StagingStore Protocol postgres_staging_store.py # PostgreSQL-backed store routing.py # Pure routing functions (route_after_fetch, etc.) pr_completed.py # PR graph nodes + auto_create_ticket release.py # Release graph nodes + CI/CD approval loop full_cycle.py # Subgraph composition ci_nodes.py # CI trigger, poll, notify nodes polling.py # Reusable async poll_until utility models/ pr.py, ticket.py, release.py, pipeline.py webhook.py, review.py, jira.py, build.py tools/ azdo.py # Azure DevOps REST client jira.py # Jira REST client (transitions + create_issue) slack.py # Slack dual-mode (webhook + Web API) claude_review.py # Claude Code CLI (review + ticket generation) git_local.py # Local git ops (fetch, checkout PR branch, diff) _http.py, _retry.py # Shared helpers services/ pr_poller.py # Background PR polling loop pr_dedup.py # PR deduplication via agent_threads scripts/ migrate_json_to_db.py # One-time JSON -> PostgreSQL migration tests/ # 1061 tests, 96%+ coverage ``` ## Docker ```bash # Required: set POSTGRES_PASSWORD and WEBHOOK_SECRET in .env docker compose up -d ``` The agent service includes a health check at `/status`. PostgreSQL uses `pg_isready` with `service_healthy` dependency. ## Running Locally (WSL Recommended) The app runs best on **WSL (Ubuntu)** because: - `psycopg` async requires `SelectorEventLoop` (incompatible with Windows `ProactorEventLoop`) - `subprocess.run` captures Claude CLI stdout correctly on Linux but not reliably on Windows - PostgreSQL runs in Docker (accessible from both Windows and WSL via `localhost`) ### Setup ```bash # 1. Clone the project to WSL native filesystem (NOT /mnt/c/) cd ~/git/billo git clone billo-release-agent cd billo-release-agent # 2. Start PostgreSQL (from Windows or WSL) docker compose up -d db # 3. Install uv in WSL (if needed) curl -LsSf https://astral.sh/uv/install.sh | sh export PATH="$HOME/.local/bin:$PATH" # 4. Create venv and install dependencies # IMPORTANT: Run from the WSL-native path, not /mnt/c/. # If .venv was created from /mnt/c/, delete and recreate: # rm -rf .venv && uv venv --python python3.12 uv sync --all-extras # 5. Configure .env # Key settings: # CLAUDE_CMD=claude # REPOS_BASE_DIR=~/git/billo (WSL native path, not /mnt/c/) # PR_POLL_ENABLED=False (disable during dev to avoid noise) # SLACK_WEBHOOK_URL= (leave empty during dev) # 6. Start the server uv run uvicorn release_agent.main:app --host 0.0.0.0 --port 8080 # 7. Test curl http://localhost:8080/status curl -X POST http://localhost:8080/manual/pr/10443 ``` ### Important: venv Must Be Created from WSL Path If the `.venv` was created while the working directory was `/mnt/c/...`, the uvicorn shebang will point to the Windows-mounted path. This causes the server to load stale code from the Windows filesystem instead of your WSL edits. To fix: `rm -rf .venv && uv venv --python python3.12 && uv sync --all-extras` ### Windows-only (Fallback) If you must run on Windows directly, use the provided `run.py` script which sets `WindowsSelectorEventLoopPolicy` before starting uvicorn: ```bash uv run python run.py ``` Note: Claude CLI subprocess may return empty stdout on Windows due to event loop incompatibility. WSL is the recommended approach. ### Local Git Checkout for Code Review When `REPOS_BASE_DIR` is set and the repo is cloned locally, the agent will: 1. `git fetch origin` to get the latest remote state 2. `git checkout` the PR source branch 3. `git diff origin/develop...HEAD` to generate a real diff (not just file names) 4. Run Claude Code CLI with `cwd` set to the repo on the PR branch 5. `git checkout develop` to restore the branch after review This gives Claude full codebase context on the actual PR branch, producing much more thorough reviews than the AzDo iteration API (which only returns file paths). **Performance**: Clone repos to WSL native filesystem (`~/git/billo/`), not `/mnt/c/`. Claude Code CLI with `--allowedTools Read,Glob,Grep` on `/mnt/c` is very slow (10+ minutes per review vs seconds on native fs). ## Slack App Setup To use interactive buttons (optional — REST API approvals still work without it): 1. Create a Slack App at https://api.slack.com/apps 2. Enable **Interactivity** with Request URL: `https:///slack/interactions` 3. Add Bot Token Scopes: `chat:write`, `chat:update` 4. Install to workspace, get Bot Token (`xoxb-...`) 5. Set `SLACK_BOT_TOKEN`, `SLACK_SIGNING_SECRET`, `SLACK_CHANNEL_ID` in `.env` ## Current Status ### Working - App startup, health check, API endpoints - Azure DevOps API integration (get PR, list active PRs, get iterations/changes) - PR info parsing (repo_name, ticket_id, branch extraction) - Graph execution (full pr_completed flow: parse -> fetch -> route -> review -> notify) - Database read/write (agent_threads table) - Slack error handling (empty webhook URL gracefully skipped) - Claude CLI ticket generation (tested: returns structured JSON) - Claude CLI code review (tested: returns structured JSON with verdict + issues) - PR review comments posted to Azure DevOps (inline + summary) - Local git checkout + real `git diff` for PR code review (with fallback to AzDo API) - Branch restore to develop after review completes - Node type annotations fixed (`RunnableConfig` instead of `dict`) ### Known Issues | Issue | Severity | Workaround | |-------|----------|------------| | Windows: Claude CLI subprocess returns empty stdout | HIGH | Run in WSL | | WSL: venv created from /mnt/c/ loads stale code | HIGH | Delete .venv, recreate from WSL native path | | WSL + /mnt/c: Claude CLI Read/Glob very slow (10+ min) | MEDIUM | Clone repos to WSL native fs | | Graph has no LangGraph checkpointer (interrupt not persistent) | MEDIUM | Graphs run to completion or fail; no resume | | `_upsert_thread` only writes final state (no intermediate updates) | LOW | Query DB only after graph completes | | CI poll may run indefinitely (no build to poll in dev) | LOW | Leave `PR_POLL_ENABLED=False` | | Config test failures (env var leakage from .env) | LOW | Run with `-k "not test_config"` | | Local git checkout mutates repo working tree | LOW | Review runs sequentially; branch restored after | ### TODO (Not Yet Implemented) - [ ] Wire LangGraph checkpointer (PostgreSQL) for interrupt persistence - [ ] Interrupt decision validation (currently any resume value proceeds) - [ ] Slack interactive buttons end-to-end (Slack App not yet created) - [ ] CI/CD pipeline trigger end-to-end testing - [ ] Release approval gate detection (check_release_approvals is a stub) - [ ] `last_merge_source_commit` from AzDo API for safe merge - [ ] Operator token auth testing in production - [ ] Multi-stage Dockerfile for smaller images - [ ] Centralize `_upsert_thread` into shared `api/db.py` module - [ ] Remove dead `has_ticket` routing function - [ ] PR poller dedup query correctness (unnest pair-wise matching untested against real DB) - [ ] `archive_release` date injection (replace `date.today()` with config) - [ ] Approval loop max iteration guard (prevent infinite loops) - [ ] Migrate existing release JSON data to PostgreSQL