Includes: CLAUDE.md, settings.json, agents, commands, rules, skills, hooks, contexts, evals, get-shit-done, plugin configs (installed list and marketplace sources). Excludes credentials, runtime caches, telemetry, session data, and plugin binary cache.
6.8 KiB
6.8 KiB
name, description
| name | description |
|---|---|
| prod-error-triage | End-to-end production error triage workflow - search logs, diagnose root cause, fix code, create Jira ticket, create branch, commit, and create PR. Use when investigating production errors, log messages, or exceptions. |
Production Error Triage
End-to-end workflow for investigating production errors and shipping fixes.
When to Use
Trigger when the user:
- Pastes a log message or error and asks to investigate
- Asks "why is X failing in prod"
- Wants to trace a production exception
Defaults
- Jira project_key:
ALLPOST - Jira component:
BE - Azure DevOps org:
https://dev.azure.com/billodev - Azure DevOps project:
Billo App Platform
Workflow
Execute these phases in order. Report findings to the user after each phase before proceeding.
Phase 1: Log Search & Context Gathering
- Search for the error using
mcp__billo-es-logs__search_logswith the error message or keywords - Expand the time window if no results (start with
now-1h, widen tonow-24h,now-7d) - Get surrounding logs by searching with the same
Correlation-IDand a narrow time window around the error - Quantify impact using
count_only: trueto understand if this is isolated or widespread - Check for patterns - compare error logs with success logs using
sample: trueto find what differs
Key questions to answer:
- How many errors in the last 24h?
- Is it intermittent or constant?
- Which application/service is affected?
- Is there a Correlation-ID to trace the full request?
Phase 2: Root Cause Analysis
- Read the stack trace - identify the exact file and line number
- Read the source code at the error location using the file path from the stack trace
- Trace upstream - read the calling code to understand the full flow
- Identify the real error - the logged exception may wrap the actual cause. Look for inner exceptions and upstream error logs with the same Correlation-ID
- Compare success vs failure - if intermittent, determine what condition causes the divergence
Present findings to the user:
- Error chain (what calls what)
- Root cause (the actual bug, not the symptom)
- Why it is intermittent (if applicable)
- Impact scope
Phase 3: Code Fix
- Implement the minimal fix addressing the root cause
- Consider idempotency - if the error is caused by retries, add guards to make the operation safe to retry
- Consider edge cases - identify scenarios where the fix might not cover (e.g. partial completion) and flag them to the user
- Show the diff to the user and get confirmation before proceeding
Multi-Repo Changes
If the fix spans multiple repos (e.g. Infrastructure + Payment):
- Fix the upstream repo first (e.g. shared library)
- Merge and publish a new NuGet package version
- Update the downstream repo to reference the new version
- Check dependency compatibility before updating:
Microsoft.Extensions.*major version must match the downstream project's TFM (net9.0 = 9.x)AWSSDK.*major version must not conflict with other transitive dependencies (e.g. MongoDB.Driver requires AWSSDK.Core < 4.0)- Run
dotnet restoreto verify before committing
Phase 4: Jira Ticket
Create a ticket using mcp__billo-es-logs__create_bug_ticket with:
- project_key:
ALLPOST(default, ask user if different) - component:
BE - priority: Based on impact (2300+ errors/day =
Highest) - summary: Short, searchable - include error type and affected component
- description: Uses lightweight formatting that converts to Jira ADF:
- Lines ending with
:become h3 headings (e.g.Problem:) - Lines starting with
-become bullet lists - Text wrapped in
**becomes bold - Everything else is a paragraph
- Lines ending with
Problem:
DownloadAndSendInvoiceCommandHandler fails with 409 BlobAlreadyExists
Impact:
- 2300+ errors in the last 24 hours
- Affects both regular and **reminder** invoices
Root Cause:
- AzureStorage.StoreFileAsync calls blobClient.UploadAsync() without overwrite flag
- No idempotency check in the handler
Fix:
Add idempotency guard to check **InvoiceTransaction** status before uploading
Files:
- Billo.Platform.Payment.Business/Commands/Handlers/DownloadAndSendInvoiceCommandHandler.cs
If the API returns 400, likely causes:
- Missing required field (e.g.
component) - Invalid
priorityvalue - Wrong
project_key
Use mcp__billo-es-logs__search_tickets with an existing ticket key to discover required fields.
Phase 5: Branch & Commit
- Create branch using the naming convention
{prefix}/{TICKET_ID}_{description}:Choose the prefix that best matches the work type. Any prefix is valid.bug/ALLPOST-4228_fix-invoice-upload-blob-already-exists fix/ALLPOST-4230_crash feature/ALLPOST-4028_login-page feat/ALLPOST-4028_login-page chore/ALLPOST-4031_cleanup - Stage only the changed files - never
git add . - Commit with conventional commit format:
fix: {description} ({TICKET_KEY}) {Brief explanation of what and why} - Ask before pushing - do not push without user confirmation
Phase 6: Create PR
Create PR using Azure DevOps CLI:
az repos pr create \
--org "https://dev.azure.com/billodev" \
--project "Billo App Platform" \
--detect false \
--repository "{REPO_NAME}" \
--source-branch "{BRANCH}" \
--target-branch "develop" \
--title "{type}: {description} ({TICKET_KEY})" \
--description "{summary of changes}"
Notes:
--projectis required, will error without it--detect falseavoids auto-detection issues- Return the PR URL to the user when done
Tools Reference
| Phase | Tool | Purpose |
|---|---|---|
| Log search | mcp__billo-es-logs__search_logs |
Search with query, time range, level, application |
| Impact | mcp__billo-es-logs__search_logs with count_only: true |
Count matching errors |
| Patterns | mcp__billo-es-logs__search_logs with sample: true |
Random sample from large result sets |
| Source code | Read, Glob, Grep |
Find and read source files |
| Ticket lookup | mcp__billo-es-logs__search_tickets |
Find existing tickets or discover field requirements |
| Ticket create | mcp__billo-es-logs__create_bug_ticket |
Create Jira bug ticket |
| Git | Bash |
Branch, commit, push |
| PR | az repos pr create |
Create Azure DevOps pull request |
Tips
- Always search logs before reading code - the logs tell you where to look
- Use
Correlation-IDto trace a single request across services - When errors are intermittent, the root cause is often in retry/concurrency behavior, not in the happy path
- When updating shared NuGet packages, always verify transitive dependency compatibility with downstream projects before publishing
- Flag edge cases to the user rather than silently ignoring them