chore: initial backup of Claude Code configuration

Includes: CLAUDE.md, settings.json, agents, commands, rules, skills,
hooks, contexts, evals, get-shit-done, plugin configs (installed list
and marketplace sources). Excludes credentials, runtime caches,
telemetry, session data, and plugin binary cache.
This commit is contained in:
Yaojia Wang
2026-03-24 22:26:05 +01:00
commit 2876cca8fe
245 changed files with 54437 additions and 0 deletions

View File

@@ -0,0 +1,174 @@
---
name: prod-error-triage
description: End-to-end production error triage workflow - search logs, diagnose root cause, fix code, create Jira ticket, create branch, commit, and create PR. Use when investigating production errors, log messages, or exceptions.
---
# Production Error Triage
End-to-end workflow for investigating production errors and shipping fixes.
## When to Use
Trigger when the user:
- Pastes a log message or error and asks to investigate
- Asks "why is X failing in prod"
- Wants to trace a production exception
## Defaults
- **Jira project_key**: `ALLPOST`
- **Jira component**: `BE`
- **Azure DevOps org**: `https://dev.azure.com/billodev`
- **Azure DevOps project**: `Billo App Platform`
## Workflow
Execute these phases in order. Report findings to the user after each phase before proceeding.
### Phase 1: Log Search & Context Gathering
1. **Search for the error** using `mcp__billo-es-logs__search_logs` with the error message or keywords
2. **Expand the time window** if no results (start with `now-1h`, widen to `now-24h`, `now-7d`)
3. **Get surrounding logs** by searching with the same `Correlation-ID` and a narrow time window around the error
4. **Quantify impact** using `count_only: true` to understand if this is isolated or widespread
5. **Check for patterns** - compare error logs with success logs using `sample: true` to find what differs
Key questions to answer:
- How many errors in the last 24h?
- Is it intermittent or constant?
- Which application/service is affected?
- Is there a Correlation-ID to trace the full request?
### Phase 2: Root Cause Analysis
1. **Read the stack trace** - identify the exact file and line number
2. **Read the source code** at the error location using the file path from the stack trace
3. **Trace upstream** - read the calling code to understand the full flow
4. **Identify the real error** - the logged exception may wrap the actual cause. Look for inner exceptions and upstream error logs with the same Correlation-ID
5. **Compare success vs failure** - if intermittent, determine what condition causes the divergence
Present findings to the user:
- Error chain (what calls what)
- Root cause (the actual bug, not the symptom)
- Why it is intermittent (if applicable)
- Impact scope
### Phase 3: Code Fix
1. **Implement the minimal fix** addressing the root cause
2. **Consider idempotency** - if the error is caused by retries, add guards to make the operation safe to retry
3. **Consider edge cases** - identify scenarios where the fix might not cover (e.g. partial completion) and flag them to the user
4. **Show the diff** to the user and get confirmation before proceeding
#### Multi-Repo Changes
If the fix spans multiple repos (e.g. Infrastructure + Payment):
1. Fix the upstream repo first (e.g. shared library)
2. Merge and publish a new NuGet package version
3. Update the downstream repo to reference the new version
4. **Check dependency compatibility before updating**:
- `Microsoft.Extensions.*` major version must match the downstream project's TFM (net9.0 = 9.x)
- `AWSSDK.*` major version must not conflict with other transitive dependencies (e.g. MongoDB.Driver requires AWSSDK.Core < 4.0)
- Run `dotnet restore` to verify before committing
### Phase 4: Jira Ticket
Create a ticket using `mcp__billo-es-logs__create_bug_ticket` with:
- **project_key**: `ALLPOST` (default, ask user if different)
- **component**: `BE`
- **priority**: Based on impact (2300+ errors/day = `Highest`)
- **summary**: Short, searchable - include error type and affected component
- **description**: Uses lightweight formatting that converts to Jira ADF:
- Lines ending with `:` become **h3 headings** (e.g. `Problem:`)
- Lines starting with `- ` become **bullet lists**
- Text wrapped in `**` becomes **bold**
- Everything else is a paragraph
```
Problem:
DownloadAndSendInvoiceCommandHandler fails with 409 BlobAlreadyExists
Impact:
- 2300+ errors in the last 24 hours
- Affects both regular and **reminder** invoices
Root Cause:
- AzureStorage.StoreFileAsync calls blobClient.UploadAsync() without overwrite flag
- No idempotency check in the handler
Fix:
Add idempotency guard to check **InvoiceTransaction** status before uploading
Files:
- Billo.Platform.Payment.Business/Commands/Handlers/DownloadAndSendInvoiceCommandHandler.cs
```
If the API returns 400, likely causes:
- Missing required field (e.g. `component`)
- Invalid `priority` value
- Wrong `project_key`
Use `mcp__billo-es-logs__search_tickets` with an existing ticket key to discover required fields.
### Phase 5: Branch & Commit
1. **Create branch** using the naming convention `{prefix}/{TICKET_ID}_{description}`:
```
bug/ALLPOST-4228_fix-invoice-upload-blob-already-exists
fix/ALLPOST-4230_crash
feature/ALLPOST-4028_login-page
feat/ALLPOST-4028_login-page
chore/ALLPOST-4031_cleanup
```
Choose the prefix that best matches the work type. Any prefix is valid.
2. **Stage only the changed files** - never `git add .`
3. **Commit** with conventional commit format:
```
fix: {description} ({TICKET_KEY})
{Brief explanation of what and why}
```
4. **Ask before pushing** - do not push without user confirmation
### Phase 6: Create PR
Create PR using Azure DevOps CLI:
```bash
az repos pr create \
--org "https://dev.azure.com/billodev" \
--project "Billo App Platform" \
--detect false \
--repository "{REPO_NAME}" \
--source-branch "{BRANCH}" \
--target-branch "develop" \
--title "{type}: {description} ({TICKET_KEY})" \
--description "{summary of changes}"
```
Notes:
- `--project` is required, will error without it
- `--detect false` avoids auto-detection issues
- Return the PR URL to the user when done
## Tools Reference
| Phase | Tool | Purpose |
|-------|------|---------|
| Log search | `mcp__billo-es-logs__search_logs` | Search with query, time range, level, application |
| Impact | `mcp__billo-es-logs__search_logs` with `count_only: true` | Count matching errors |
| Patterns | `mcp__billo-es-logs__search_logs` with `sample: true` | Random sample from large result sets |
| Source code | `Read`, `Glob`, `Grep` | Find and read source files |
| Ticket lookup | `mcp__billo-es-logs__search_tickets` | Find existing tickets or discover field requirements |
| Ticket create | `mcp__billo-es-logs__create_bug_ticket` | Create Jira bug ticket |
| Git | `Bash` | Branch, commit, push |
| PR | `az repos pr create` | Create Azure DevOps pull request |
## Tips
- Always search logs before reading code - the logs tell you where to look
- Use `Correlation-ID` to trace a single request across services
- When errors are intermittent, the root cause is often in retry/concurrency behavior, not in the happy path
- When updating shared NuGet packages, always verify transitive dependency compatibility with downstream projects before publishing
- Flag edge cases to the user rather than silently ignoring them