Files
knowledge-base/4 - Resources/Claude-Code/Ralphinho RFC-DAG 编排模式.md
2026-04-06 23:27:39 +02:00

8.7 KiB
Raw Blame History

created, type, tags, source
created type tags source
2026-04-06 resource
resource
claude-code
AI-tools
ralphinho
RFC
DAG
multi-agent
orchestration
ECC
~/.claude/skills/ralphinho-rfc-pipeline/SKILL.md

Ralphinho RFC-DAG 编排模式

最复杂的自主循环模式。把 RFC/PRD 分解为依赖 DAG按层并行执行每个 unit 过分级质量管道,最后通过合并队列着陆。由 enitrat 创建。

相关笔记:Autonomous Loops 自主循环模式dmux 多Agent并行编排Autonomous Agent Harness 自主代理框架ECC 编排替代方案 (orchestrate 迁移)

架构总览

RFC 文档
  |
  v
AI 分解为 WorkUnit (含依赖 DAG)
  |
  v
RALPH LOOP (最多 3 pass)
  |
  +-- 按 DAG 层执行 (层内并行):
  |     每个 unit 在独立 worktree:
  |     Research -> Plan -> Implement -> Test -> Review
  |     (深度按复杂度分级)
  |
  +-- 合并队列:
        Rebase onto main -> Run tests -> Land or Evict
        被驱逐的 unit 带着冲突上下文重新进入

WorkUnit 定义

interface WorkUnit {
  id: string;              // kebab-case 标识
  name: string;            // 可读名称
  rfcSections: string[];   // 对应 RFC 哪些章节
  description: string;     // 详细描述
  deps: string[];          // 依赖 (其他 unit ID)
  acceptance: string[];    // 具体验收标准
  tier: "trivial" | "small" | "medium" | "large";
}

分解原则

  • 偏好更少、更内聚的 unit减少合并风险
  • 最小化跨 unit 文件重叠(避免冲突)
  • 测试跟随实现(不要分成 "implement X" + "test X"
  • 仅在有真实代码依赖时才建立依赖关系

DAG 层级执行

依赖 DAG 决定执行顺序:

Layer 0: [unit-a, unit-b]      <- 无依赖,并行
Layer 1: [unit-c]              <- 依赖 unit-a
Layer 2: [unit-d, unit-e]      <- 依赖 unit-c

同层内并行,跨层顺序执行。

复杂度分级管道

不同复杂度走不同深度的质量管道:

级别 管道阶段
trivial implement -> test
small implement -> test -> code-review
medium research -> plan -> implement -> test -> PRD-review + code-review -> review-fix
large research -> plan -> implement -> test -> PRD-review + code-review -> review-fix -> final-review

分离上下文窗口 (消除自我审查偏差)

每个阶段运行在独立 agent 进程中reviewer 永远不是 author

阶段 模型 目的
Research Sonnet 读代码+RFC产出上下文文档
Plan Opus 设计实现步骤
Implement Codex/Sonnet 写代码
Test Sonnet 跑构建+测试
PRD Review Sonnet Spec 合规检查
Code Review Opus 质量+安全检查
Review Fix Codex/Sonnet 处理 review 意见
Final Review Opus 质量门 (仅 large tier)

合并队列

Unit branch
  |
  +-- Rebase onto main
  |     冲突? -> EVICT (捕获冲突上下文)
  |
  +-- Run build + tests
  |     失败? -> EVICT (捕获测试输出)
  |
  +-- Pass -> Fast-forward main, push, delete branch

文件重叠智能

  • 无重叠的 unit投机性并行着陆
  • 有重叠的 unit逐个着陆每次 rebase

驱逐恢复

被驱逐时完整上下文冲突文件、diff、测试输出传给下次实现

## MERGE CONFLICT -- RESOLVE BEFORE NEXT LANDING

Your previous implementation conflicted with another unit that landed first.
Restructure your changes to avoid the conflicting files/lines below.

{完整驱逐上下文和 diff}

阶段间数据流

research.contextFilePath --------> plan
plan.implementationSteps --------> implement
implement.{filesCreated} --------> test, reviews
test.failingSummary ------------> reviews, implement (next pass)
reviews.{feedback} -------------> review-fix -> implement (next pass)
final-review.reasoning ---------> implement (next pass)
evictionContext -----------------> implement (after merge conflict)

Worktree 隔离

每个 unit 在独立 worktree 中运行。同一 unit 的各管道阶段共享 worktree保留跨阶段状态上下文文件、计划文件、代码变更


实际例子smart-support 多租户改造

Step 1: 写 RFC

# RFC: Multi-Tenant Agent Architecture

## Goal
Support multiple tenants, each with own agent config and conversation history.

## Work Units
1. tenant-model: Tenant SQLAlchemy model + migration
2. tenant-middleware: FastAPI middleware, extract tenant from JWT
3. agent-scoping: Scope agent registry per tenant
4. conversation-isolation: Filter conversations by tenant_id
5. frontend-tenant-selector: Tenant switcher in UI header
6. e2e-multi-tenant: E2E test for full flow

## Dependencies
tenant-model -> tenant-middleware -> agent-scoping
tenant-model -> conversation-isolation
agent-scoping + conversation-isolation -> frontend-tenant-selector
all -> e2e-multi-tenant

Step 2: DAG 分解

Layer 0: [tenant-model]                                    # tier: small
Layer 1: [tenant-middleware, conversation-isolation]        # tier: medium, small
Layer 2: [agent-scoping]                                   # tier: medium
Layer 3: [frontend-tenant-selector]                        # tier: small
Layer 4: [e2e-multi-tenant]                                # tier: small

Step 3: 执行脚本

#!/bin/bash
set -e

# --- Layer 0: tenant-model (small: implement -> test -> review) ---
claude -p --model sonnet "Implement Tenant SQLAlchemy model in backend/app/models/tenant.py.
  Fields: id, name, api_key_hash, created_at. Write migration. Tests first."
claude -p --model opus "Review changes for security (api_key hashing) and schema design."

# --- Layer 1: 并行 (medium + small) ---

# tenant-middleware (medium: research -> plan -> implement -> test -> review)
(
  claude -p --model sonnet --allowedTools "Read,Grep,Glob" \
    "Research how FastAPI middleware works in this project. Document in /tmp/middleware-research.md"
  claude -p --model opus \
    "Read /tmp/middleware-research.md. Plan tenant extraction from JWT. Write to /tmp/middleware-plan.md"
  claude -p --model sonnet \
    "Read /tmp/middleware-plan.md. Implement tenant middleware. Tests first."
  claude -p --model opus \
    "Review tenant-middleware changes for security and correctness."
) &
PID1=$!

# conversation-isolation (small: implement -> test -> review)
(
  claude -p --model sonnet \
    "Add tenant_id to conversations table. Filter all conversation queries by tenant_id. Tests first."
  claude -p --model opus \
    "Review conversation-isolation changes."
) &
PID2=$!

wait $PID1 $PID2

# De-sloppify Layer 1
claude -p "Review all uncommitted changes. Remove test slop. Run pytest --cov=app."

# --- Layer 2: agent-scoping (medium) ---
claude -p --model sonnet --allowedTools "Read,Grep,Glob" \
  "Research how backend/app/registry.py loads agents. Document in /tmp/registry-research.md"
claude -p --model opus \
  "Read /tmp/registry-research.md. Plan tenant-scoped agent loading. Write to /tmp/scoping-plan.md"
claude -p --model sonnet \
  "Read /tmp/scoping-plan.md. Implement tenant-scoped agent loading. Tests first."
claude -p --model opus \
  "Review agent-scoping changes for correctness and security."

# --- Layer 3: frontend (small) ---
claude -p "Add tenant selector to frontend header. Call GET /api/tenants.
  Store selected tenant in context. Pass tenant_id header on all API calls."

# --- Layer 4: E2E (small) ---
claude -p "Write E2E test in backend/tests/e2e/test_multi_tenant.py:
  1. Create two tenants
  2. Send chat as tenant A
  3. Verify tenant B cannot see A's conversations
  Run pytest -m e2e"

# --- Final verification ---
claude -p "Run pytest --cov=app --cov-report=term-missing. Fix any failures."

何时使用 Ralphinho vs 更简单的模式

信号 用 Ralphinho 用更简单的
多个相互依赖的 work unit
需要并行实现
合并冲突可能 否 (sequential 就行)
单文件变更 是 (sequential)
多天项目 可能 (continuous-claude)
Spec/RFC 已写好 可能
快速迭代单一事物 是 (NanoClaw 或 pipeline)

关键设计原则

  1. 确定性执行 -- 前置分解锁定并行度和<E5BAA6><E5928C><EFBFBD>
  2. 人在关键杠杆点审查 -- work plan 是最高杠杆的干预点
  3. 关注点分离 -- 每阶段独立上下文+独立 agent
  4. 带上下文的冲突恢复 -- 不是盲目重试
  5. 分级深度 -- trivial 跳过 research/reviewlarge 最大审查力度
  6. 可恢复工作流 -- 状态持久化到 SQLite任意点恢复