# Billo Release Agent A LangGraph-based release automation agent for Billo. Automates the full release pipeline: PR discovery, code review (via Claude Code CLI), Jira ticket management, staging release tracking, CI/CD pipeline triggering with approval gates, and Slack interactive notifications. ## Architecture ``` +--- Azure DevOps Webhook ---+ +--- PR Poller (every 5 min) ---+ | POST /webhooks/azdo | | Scans WATCHED_REPOS for | | (push-based) | | active PRs (pull-based) | +------------+---------------+ +------------+------------------+ | | v v +---------------------------------------------+ | FastAPI Application | | /webhooks/azdo /slack/interactions | | /approvals/* /manual/* /status | +---------------------+------------------------+ | v +---------------------------------------------+ | LangGraph Graphs | | | | pr_completed: | | parse -> fetch -> [has ticket?] | | no -> Claude generates ticket | | yes -> Claude code review | | -> merge -> Jira -> staging -> CI build | | | | release: | | create release PR -> merge -> CI build | | -> CD release -> [Sandbox approve] | | -> [Production approve] -> Slack notify | +---------------------+------------------------+ | +----------------+----------------+ | | | v v v PostgreSQL Azure DevOps Slack (buttons) - threads - PRs/Pipelines - Notifications - staging - Builds/Releases - Approvals - releases ``` ## Key Features | Feature | Description | |---------|-------------| | **PR Discovery** | Webhook-based (push) or polling-based (pull) — or both | | **Auto-Create Jira Ticket** | When PR branch has no ticket ID, Claude generates summary + description and creates a Jira Story | | **AI Code Review** | Claude Code CLI reviews PRs with full repo context (Read/Glob/Grep), using your subscription | | **CI/CD Integration** | Triggers CI builds after merge, polls for completion, handles CD release approval gates | | **Slack Interactive** | Approval requests with [Approve]/[Cancel] buttons, CI/CD status notifications | | **Human-in-the-loop** | 5 interrupt points where operator confirmation is required before destructive actions | | **Per-repo Versioning** | Independent semantic versioning per repository (patch auto-increment) | ## Prerequisites - Python 3.12+ - PostgreSQL 16+ - [uv](https://github.com/astral-sh/uv) (recommended) or pip - Claude Code CLI installed and authenticated (`claude` in PATH) - Slack App (for interactive buttons) or Slack Incoming Webhook (for notifications only) ## Quick Start ### Local Development ```bash # Install dependencies uv sync --all-extras # Copy and configure environment cp .env.example .env # Edit .env -- fill in all REQUIRED variables # Start PostgreSQL docker compose up -d db # Run the server uv run uvicorn release_agent.main:app --reload --port 8000 # Verify curl http://localhost:8000/status ``` ### Docker Compose (Production) ```bash cp .env.example .env # Edit .env -- POSTGRES_PASSWORD, WEBHOOK_SECRET, etc. are required docker compose up -d ``` Tables are created automatically on first startup. ## Configuration All configuration is via environment variables. See `.env.example` for the full list. ### Required Variables | Variable | Description | |----------|-------------| | `AZDO_ORGANIZATION` | Azure DevOps organization name | | `AZDO_PROJECT` | Azure DevOps project name | | `AZDO_PAT` | Azure DevOps personal access token | | `POSTGRES_DSN` | PostgreSQL connection string | | `POSTGRES_PASSWORD` | PostgreSQL password (used by docker-compose) | | `JIRA_EMAIL` | Jira account email | | `JIRA_API_TOKEN` | Jira API token | | `SLACK_WEBHOOK_URL` | Slack incoming webhook URL | | `WEBHOOK_SECRET` | Shared secret for validating AzDo webhooks (must be non-empty) | ### Optional Variables | Variable | Default | Description | |----------|---------|-------------| | `REPOS_BASE_DIR` | `""` | Base dir with Billo repos (e.g., `/c/Users/yaoji/git/Billo`) | | `WATCHED_REPOS` | `""` | Comma-separated repos to poll (e.g., `Billo.Platform.Payment,Billo.Platform.Document.DocumentAnalyser`) | | `PR_POLL_ENABLED` | `False` | Enable periodic PR polling | | `PR_POLL_INTERVAL_SECONDS` | `300` | Polling interval (5 min) | | `PR_POLL_TARGET_BRANCH` | `refs/heads/develop` | Target branch filter | | `DEFAULT_JIRA_PROJECT` | `ALLPOST` | Jira project key for auto-created tickets | | `AUTO_CREATE_TICKET_ENABLED` | `True` | Auto-create Jira ticket when branch has no ticket ID | | `SLACK_BOT_TOKEN` | `""` | Slack App bot token (for interactive buttons) | | `SLACK_SIGNING_SECRET` | `""` | Slack App signing secret (required for /slack/interactions) | | `SLACK_CHANNEL_ID` | `""` | Channel for interactive messages | | `CI_POLL_INTERVAL_SECONDS` | `30` | CI build status poll interval | | `CI_POLL_MAX_WAIT_SECONDS` | `1800` | Max wait for CI completion (30 min) | | `OPERATOR_TOKEN` | `""` | Token for operator endpoints (empty = no auth) | | `JIRA_BASE_URL` | `https://billolife.atlassian.net` | Jira instance URL | | `PORT` | `8000` | HTTP server port | ### Security Notes - `WEBHOOK_SECRET` must be non-empty; empty secret rejects all webhooks - `POSTGRES_PASSWORD` has no default in docker-compose; fails if unset - `SLACK_SIGNING_SECRET` must be set for `/slack/interactions` to accept requests (returns 503 if empty) - Slack signature verification includes 5-minute replay attack prevention - All secrets use `SecretStr` and are never logged or included in error responses - Set `OPERATOR_TOKEN` in production to protect approval and manual trigger endpoints ## API Endpoints ### Webhooks | Method | Path | Auth | Description | |--------|------|------|-------------| | POST | `/webhooks/azdo` | `X-Webhook-Secret` | Receive Azure DevOps PR webhook events | | POST | `/slack/interactions` | Slack Signing Secret | Receive Slack button click callbacks | ### Approvals (requires `X-Operator-Token` when configured) | Method | Path | Description | |--------|------|-------------| | GET | `/approvals/pending` | List threads awaiting operator approval | | POST | `/approvals/{thread_id}` | Submit approval decision (merge/cancel/approve/skip) | ### Status and Manual Triggers | Method | Path | Auth | Description | |--------|------|------|-------------| | GET | `/status` | None | Health check | | GET | `/releases/{repo}` | None | List versions for a repo | | GET | `/staging?repo={repo}` | None | Current staging release | | POST | `/manual/pr/{pr_id}` | `X-Operator-Token` | Manually trigger PR processing | | POST | `/manual/release` | `X-Operator-Token` | Manually trigger a release | ## Graph Workflows ### PR Completed ``` parse_webhook -> fetch_pr_details -> route_after_fetch |-- merged -----------------> calculate_version -> update_staging -> CI build -> END |-- active_with_ticket -----> move_jira_code_review -+ |-- active_no_ticket -------> auto_create_ticket ----+ | run_code_review -> evaluate_review |-- approve -> [Slack: Merge?] -> merge_pr |-- request_changes -> notify -> END -> Jira transitions -> calculate_version -> update_staging -> CI build -> notify -> END ``` ### Release ``` load_staging -> [Slack: Create release?] -> create_release_pr -> [Slack: Merge release?] -> merge_release_pr -> CI build on main -> poll until complete |-- ci_failed -> notify failure -> END |-- ci_passed -> wait for CD release -> approval loop: |-- [Slack: Approve Sandbox?] -> approve -> poll again |-- [Slack: Approve Production?] -> approve -> poll again |-- all_deployed -> move tickets to Done -> Slack release notification -> archive -> END ``` ### Interrupt Points (Slack buttons) | # | When | Slack Message | Buttons | |---|------|--------------|---------| | 1 | After code review approves | PR title + review summary | [Merge] [Cancel] | | 2 | Before creating release PR | Version + ticket list | [Create] [Cancel] | | 3 | Before merging release PR | Release PR link | [Merge] [Cancel] | | 4 | Before triggering pipelines | Pipeline list | [Trigger] [Skip] | | 5 | Before approving release stage | Stage name + status | [Approve] [Skip] | ## PR Polling (Alternative to Webhooks) When `PR_POLL_ENABLED=True`, the agent periodically scans all `WATCHED_REPOS` for active PRs targeting the configured branch. New PRs not yet tracked in `agent_threads` are automatically processed through the `pr_completed` graph. This eliminates the need for Azure DevOps webhook configuration and works behind firewalls without public endpoint exposure. ## Auto-Create Jira Ticket When a PR branch has no ticket ID (e.g., `chore/update-dependencies` instead of `feature/ALLPOST-4028_login-page`), the agent automatically: 1. Sends the PR diff to Claude Code CLI 2. Claude generates a concise ticket summary and description 3. Creates a Jira Story in the `DEFAULT_JIRA_PROJECT` 4. Continues the normal workflow with the created ticket ## Database Schema Tables are created automatically on startup: ```sql -- Thread tracking for LangGraph interrupts and PR dedup agent_threads (thread_id, graph_name, repo_name, pr_id, status, state JSONB, slack_message_ts, created_at, updated_at) -- Current in-progress releases (one per repo) staging_releases (repo, version, started_at, tickets JSONB, updated_at) -- Completed releases (immutable history) archived_releases (repo, version, started_at, tickets JSONB, released_at) ``` ## Migrating Existing JSON Files If you have existing release data from the Claude Code skill: ```bash # Dry run uv run python scripts/migrate_json_to_db.py \ --source ../release-workflow/releases --dry-run # Execute uv run python scripts/migrate_json_to_db.py \ --source ../release-workflow/releases \ --dsn "postgresql://agent:password@localhost/agent" ``` ## Development ### Running Tests ```bash # Run all tests with coverage (1061 tests, 96%+ coverage) uv run pytest # Run without coverage (faster) uv run pytest --no-cov # Run specific module uv run pytest tests/graph/test_pr_completed.py -v ``` ### Project Structure ``` src/release_agent/ main.py # FastAPI app, lifespan, task management config.py # pydantic-settings (all env vars) state.py # LangGraph ReleaseState TypedDict exceptions.py # Exception hierarchy branch_parser.py # Extract ticket ID from branch name versioning.py # Per-repo version calculation api/ models.py # HTTP request/response Pydantic models dependencies.py # FastAPI Depends() + operator auth webhooks.py # POST /webhooks/azdo approvals.py # Approval endpoints status.py # Status, releases, manual triggers slack_interactions.py # POST /slack/interactions (button callbacks) graph/ dependencies.py # ToolClients, StagingStore Protocol postgres_staging_store.py # PostgreSQL-backed store routing.py # Pure routing functions (route_after_fetch, etc.) pr_completed.py # PR graph nodes + auto_create_ticket release.py # Release graph nodes + CI/CD approval loop full_cycle.py # Subgraph composition ci_nodes.py # CI trigger, poll, notify nodes polling.py # Reusable async poll_until utility models/ pr.py, ticket.py, release.py, pipeline.py webhook.py, review.py, jira.py, build.py tools/ azdo.py # Azure DevOps REST client jira.py # Jira REST client (transitions + create_issue) slack.py # Slack dual-mode (webhook + Web API) claude_review.py # Claude Code CLI (review + ticket generation) _http.py, _retry.py # Shared helpers services/ pr_poller.py # Background PR polling loop pr_dedup.py # PR deduplication via agent_threads scripts/ migrate_json_to_db.py # One-time JSON -> PostgreSQL migration tests/ # 1061 tests, 96%+ coverage ``` ## Docker ```bash # Required: set POSTGRES_PASSWORD and WEBHOOK_SECRET in .env docker compose up -d ``` The agent service includes a health check at `/status`. PostgreSQL uses `pg_isready` with `service_healthy` dependency. ## Running Locally (WSL Recommended) The app runs best on **WSL (Ubuntu)** because: - `psycopg` async requires `SelectorEventLoop` (incompatible with Windows `ProactorEventLoop`) - `subprocess.run` captures Claude CLI stdout correctly on Linux but not reliably on Windows - PostgreSQL runs in Docker (accessible from both Windows and WSL via `localhost`) ### Setup ```bash # 1. Start PostgreSQL (from Windows or WSL) cd /mnt/c/Users/yaoji/git/Billo/billo-release-agent docker compose up -d db # 2. Install uv in WSL (if needed) curl -LsSf https://astral.sh/uv/install.sh | sh export PATH="$HOME/.local/bin:$PATH" # 3. Install dependencies uv sync --all-extras # 4. Configure .env # Key settings: # CLAUDE_CMD=claude (not claude.cmd — WSL finds it via PATH) # REPOS_BASE_DIR=/mnt/c/Users/yaoji/git/Billo # PR_POLL_ENABLED=False (disable during dev to avoid noise) # SLACK_WEBHOOK_URL= (leave empty during dev) # 5. Start the server uv run uvicorn release_agent.main:app --host 0.0.0.0 --port 8080 # 6. Test curl http://localhost:8080/status curl -X POST http://localhost:8080/manual/pr/10443 ``` ### Windows-only (Fallback) If you must run on Windows directly, use the provided `run.py` script which sets `WindowsSelectorEventLoopPolicy` before starting uvicorn: ```bash uv run python run.py ``` Note: Claude CLI subprocess may return empty stdout on Windows due to event loop incompatibility. WSL is the recommended approach. ### Performance Note: WSL + /mnt/c Claude Code CLI with `--allowedTools Read,Glob,Grep` on `/mnt/c` (Windows filesystem mounted in WSL) is very slow. For faster code reviews, either: 1. **Clone repos to WSL native filesystem** (`~/git/Billo/`) and set `REPOS_BASE_DIR=~/git/Billo` 2. **Remove `--allowedTools`** so Claude only reviews the diff text (faster but less thorough) ## Slack App Setup To use interactive buttons (optional — REST API approvals still work without it): 1. Create a Slack App at https://api.slack.com/apps 2. Enable **Interactivity** with Request URL: `https:///slack/interactions` 3. Add Bot Token Scopes: `chat:write`, `chat:update` 4. Install to workspace, get Bot Token (`xoxb-...`) 5. Set `SLACK_BOT_TOKEN`, `SLACK_SIGNING_SECRET`, `SLACK_CHANNEL_ID` in `.env` ## Current Status ### Working - App startup, health check, API endpoints - Azure DevOps API integration (get PR, list active PRs, get iterations/changes) - PR info parsing (repo_name, ticket_id, branch extraction) - Graph execution (full pr_completed flow: parse -> fetch -> route -> review -> notify) - Database read/write (agent_threads table) - Slack error handling (empty webhook URL gracefully skipped) - Claude CLI ticket generation (tested: returns structured JSON) - Claude CLI code review (tested: returns structured JSON with verdict + issues) - PR review comments posted to Azure DevOps (inline + summary) - Node type annotations fixed (`RunnableConfig` instead of `dict`) ### Known Issues | Issue | Severity | Workaround | |-------|----------|------------| | Windows: Claude CLI subprocess returns empty stdout | HIGH | Run in WSL | | WSL + /mnt/c: Claude CLI Read/Glob very slow (10+ min) | MEDIUM | Clone repos to WSL native fs | | Graph has no LangGraph checkpointer (interrupt not persistent) | MEDIUM | Graphs run to completion or fail; no resume | | `_upsert_thread` only writes final state (no intermediate updates) | LOW | Query DB only after graph completes | | CI poll may run indefinitely (no build to poll in dev) | LOW | Leave `PR_POLL_ENABLED=False` | | Config test failures (env var leakage from .env) | LOW | Run with `-k "not test_config"` | ### TODO (Not Yet Implemented) - [ ] Wire LangGraph checkpointer (PostgreSQL) for interrupt persistence - [ ] Interrupt decision validation (currently any resume value proceeds) - [ ] Slack interactive buttons end-to-end (Slack App not yet created) - [ ] CI/CD pipeline trigger end-to-end testing - [ ] Release approval gate detection (check_release_approvals is a stub) - [ ] `last_merge_source_commit` from AzDo API for safe merge - [ ] Operator token auth testing in production - [ ] Multi-stage Dockerfile for smaller images - [ ] Centralize `_upsert_thread` into shared `api/db.py` module - [ ] Remove dead `has_ticket` routing function - [ ] PR poller dedup query correctness (unnest pair-wise matching untested against real DB) - [ ] `archive_release` date injection (replace `date.today()` with config) - [ ] Approval loop max iteration guard (prevent infinite loops) - [ ] Migrate existing release JSON data to PostgreSQL