fix: runtime fixes for WSL deployment and integration testing

- Fix RunnableConfig type annotations (dict -> RunnableConfig) for LangGraph compat
- Fix AzDo PR URL parsing (_links.web.href fallback + remoteUrl construction)
- Fix AzDo diff endpoint (use iterations/changes instead of non-existent diffs API)
- Fix _format_diff to read changeEntries field (not changes)
- Fix URL encoding for project names with spaces (Billo App Platform)
- Fix subprocess.run for Windows (replace asyncio.create_subprocess_exec with thread pool)
- Fix SlackClient to handle empty webhook URL gracefully
- Fix notify_request_changes to catch all exceptions (not just ReleaseAgentError)
- Fix JSON parsing to strip whitespace before json.loads
- Add CLAUDE_CMD config field for cross-platform CLI path
- Add run.py for Windows SelectorEventLoop workaround
- Add db port mapping in docker-compose for local dev
- Add comprehensive README sections: WSL setup, known issues, TODO list
This commit is contained in:
Yaojia Wang
2026-03-24 23:05:04 +01:00
parent f5c2733cfb
commit b67cbcfd93
13 changed files with 272 additions and 88 deletions

View File

@@ -330,6 +330,62 @@ docker compose up -d
The agent service includes a health check at `/status`. PostgreSQL uses
`pg_isready` with `service_healthy` dependency.
## Running Locally (WSL Recommended)
The app runs best on **WSL (Ubuntu)** because:
- `psycopg` async requires `SelectorEventLoop` (incompatible with Windows `ProactorEventLoop`)
- `subprocess.run` captures Claude CLI stdout correctly on Linux but not reliably on Windows
- PostgreSQL runs in Docker (accessible from both Windows and WSL via `localhost`)
### Setup
```bash
# 1. Start PostgreSQL (from Windows or WSL)
cd /mnt/c/Users/yaoji/git/Billo/billo-release-agent
docker compose up -d db
# 2. Install uv in WSL (if needed)
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
# 3. Install dependencies
uv sync --all-extras
# 4. Configure .env
# Key settings:
# CLAUDE_CMD=claude (not claude.cmd — WSL finds it via PATH)
# REPOS_BASE_DIR=/mnt/c/Users/yaoji/git/Billo
# PR_POLL_ENABLED=False (disable during dev to avoid noise)
# SLACK_WEBHOOK_URL= (leave empty during dev)
# 5. Start the server
uv run uvicorn release_agent.main:app --host 0.0.0.0 --port 8080
# 6. Test
curl http://localhost:8080/status
curl -X POST http://localhost:8080/manual/pr/10443
```
### Windows-only (Fallback)
If you must run on Windows directly, use the provided `run.py` script which sets
`WindowsSelectorEventLoopPolicy` before starting uvicorn:
```bash
uv run python run.py
```
Note: Claude CLI subprocess may return empty stdout on Windows due to event loop
incompatibility. WSL is the recommended approach.
### Performance Note: WSL + /mnt/c
Claude Code CLI with `--allowedTools Read,Glob,Grep` on `/mnt/c` (Windows filesystem
mounted in WSL) is very slow. For faster code reviews, either:
1. **Clone repos to WSL native filesystem** (`~/git/Billo/`) and set `REPOS_BASE_DIR=~/git/Billo`
2. **Remove `--allowedTools`** so Claude only reviews the diff text (faster but less thorough)
## Slack App Setup
To use interactive buttons (optional — REST API approvals still work without it):
@@ -339,3 +395,46 @@ To use interactive buttons (optional — REST API approvals still work without i
3. Add Bot Token Scopes: `chat:write`, `chat:update`
4. Install to workspace, get Bot Token (`xoxb-...`)
5. Set `SLACK_BOT_TOKEN`, `SLACK_SIGNING_SECRET`, `SLACK_CHANNEL_ID` in `.env`
## Current Status
### Working
- App startup, health check, API endpoints
- Azure DevOps API integration (get PR, list active PRs, get iterations/changes)
- PR info parsing (repo_name, ticket_id, branch extraction)
- Graph execution (full pr_completed flow: parse -> fetch -> route -> review -> notify)
- Database read/write (agent_threads table)
- Slack error handling (empty webhook URL gracefully skipped)
- Claude CLI ticket generation (tested: returns structured JSON)
- Claude CLI code review (tested: returns structured JSON with verdict + issues)
- PR review comments posted to Azure DevOps (inline + summary)
- Node type annotations fixed (`RunnableConfig` instead of `dict`)
### Known Issues
| Issue | Severity | Workaround |
|-------|----------|------------|
| Windows: Claude CLI subprocess returns empty stdout | HIGH | Run in WSL |
| WSL + /mnt/c: Claude CLI Read/Glob very slow (10+ min) | MEDIUM | Clone repos to WSL native fs |
| Graph has no LangGraph checkpointer (interrupt not persistent) | MEDIUM | Graphs run to completion or fail; no resume |
| `_upsert_thread` only writes final state (no intermediate updates) | LOW | Query DB only after graph completes |
| CI poll may run indefinitely (no build to poll in dev) | LOW | Leave `PR_POLL_ENABLED=False` |
| Config test failures (env var leakage from .env) | LOW | Run with `-k "not test_config"` |
### TODO (Not Yet Implemented)
- [ ] Wire LangGraph checkpointer (PostgreSQL) for interrupt persistence
- [ ] Interrupt decision validation (currently any resume value proceeds)
- [ ] Slack interactive buttons end-to-end (Slack App not yet created)
- [ ] CI/CD pipeline trigger end-to-end testing
- [ ] Release approval gate detection (check_release_approvals is a stub)
- [ ] `last_merge_source_commit` from AzDo API for safe merge
- [ ] Operator token auth testing in production
- [ ] Multi-stage Dockerfile for smaller images
- [ ] Centralize `_upsert_thread` into shared `api/db.py` module
- [ ] Remove dead `has_ticket` routing function
- [ ] PR poller dedup query correctness (unnest pair-wise matching untested against real DB)
- [ ] `archive_release` date injection (replace `date.today()` with config)
- [ ] Approval loop max iteration guard (prevent infinite loops)
- [ ] Migrate existing release JSON data to PostgreSQL