Files
ColaFlow/docs/plans/sprint_5_story_0.md
Yaojia Wang 34a379750f
Some checks failed
Code Coverage / Generate Coverage Report (push) Has been cancelled
Tests / Run Tests (9.0.x) (push) Has been cancelled
Tests / Docker Build Test (push) Has been cancelled
Tests / Test Summary (push) Has been cancelled
Clean up
2025-11-15 08:58:48 +01:00

19 KiB

story_id, sprint_id, status, priority, assignee, created_date, story_type, estimated_weeks
story_id sprint_id status priority assignee created_date story_type estimated_weeks
story_0 sprint_5 not_started P0 backend 2025-11-09 epic 8

Story 0 (EPIC): Integrate Microsoft .NET MCP SDK

Type: Epic / Feature Story Priority: P0 - Critical Infrastructure Improvement Estimated Effort: 8 weeks (40 working days)

Epic Goal

Migrate ColaFlow from custom MCP implementation to Microsoft's official .NET MCP SDK using a hybrid architecture approach. The SDK will handle protocol layer, Tool/Resource registration, and transport, while ColaFlow retains its unique business logic (Diff Preview, multi-tenant isolation, Pending Changes).

Business Value

Why This Matters

  1. Code Reduction: 60-70% less boilerplate code (protocol parsing, JSON-RPC, handshake)
  2. Performance Gain: 30-40% faster response times (SDK optimizations)
  3. Maintenance: Microsoft-maintained protocol updates (no manual updates)
  4. Standard Compliance: 100% MCP specification compliance guaranteed
  5. Developer Experience: Attribute-based registration (cleaner, more intuitive)

Success Metrics

  • Code Reduction: Remove 500-700 lines of custom protocol code
  • Performance: ≥ 20% response time improvement
  • Test Coverage: Maintain ≥ 80% coverage
  • Zero Breaking Changes: All existing MCP clients work without changes
  • SDK Integration: 100% of Tools and Resources migrated

Research Context

Research Report: docs/research/mcp-sdk-integration-research.md

Key findings from research team:

  • SDK Maturity: Production-ready (v1.0+), 4000+ GitHub stars
  • Architecture Fit: Excellent fit with ColaFlow's Clean Architecture
  • Attribute System: [McpTool], [McpResource] attributes simplify registration
  • Transport Options: stdio (CLI), HTTP/SSE (Server), WebSocket (future)
  • Performance: Faster JSON parsing, optimized middleware
  • Compatibility: Supports Claude Desktop, Continue, Cline

Hybrid Architecture Strategy

What SDK Handles (Replace Custom Code)

┌──────────────────────────────────────┐
│  Microsoft .NET MCP SDK              │
├──────────────────────────────────────┤
│  ✅ Protocol Layer                   │
│    - JSON-RPC 2.0 parsing            │
│    - MCP handshake (initialize)      │
│    - Request/response routing        │
│    - Error handling                  │
│                                      │
│  ✅ Transport Layer                  │
│    - stdio (Standard In/Out)         │
│    - HTTP/SSE (Server-Sent Events)   │
│    - WebSocket (future)              │
│                                      │
│  ✅ Registration System              │
│    - Attribute-based discovery       │
│    - Tool/Resource/Prompt catalog    │
│    - Schema validation               │
└──────────────────────────────────────┘

What ColaFlow Keeps (Business Logic)

┌──────────────────────────────────────┐
│  ColaFlow Business Layer             │
├──────────────────────────────────────┤
│  🔒 Security & Multi-Tenant          │
│    - TenantContext extraction        │
│    - API Key authentication          │
│    - Field-level permissions         │
│                                      │
│  🔍 Diff Preview System              │
│    - Before/after snapshots          │
│    - Changed fields detection        │
│    - HTML diff generation            │
│                                      │
│  ✅ Approval Workflow                │
│    - PendingChange management        │
│    - Human approval required         │
│    - SignalR notifications           │
│                                      │
│  📊 Advanced Features                │
│    - Redis caching                   │
│    - Audit logging                   │
│    - Rate limiting                   │
└──────────────────────────────────────┘

Migration Phases (8 Weeks)

Phase 1: Foundation (Week 1-2) - Story 13

Goal: Setup SDK infrastructure and validate compatibility

Tasks:

  1. Install Microsoft.MCP NuGet package
  2. Create PoC Tool/Resource using SDK
  3. Verify compatibility with existing architecture
  4. Performance baseline benchmarks
  5. Team training on SDK APIs

Deliverables:

  • SDK installed and configured
  • PoC validates SDK works with ColaFlow
  • Performance baseline report
  • Migration guide for developers

Acceptance Criteria:

  • SDK integrated into ColaFlow.Modules.Mcp project
  • PoC Tool successfully called from Claude Desktop
  • Performance baseline recorded (response time, throughput)
  • Zero conflicts with existing Clean Architecture

Phase 2: Tool Migration (Week 3-4) - Story 14

Goal: Migrate all 10 Tools to SDK attribute-based registration

Tools to Migrate:

  1. create_issue[McpTool] attribute
  2. update_status[McpTool] attribute
  3. add_comment[McpTool] attribute
  4. assign_issue[McpTool] attribute
  5. create_sprint[McpTool] attribute
  6. update_sprint[McpTool] attribute
  7. log_decision[McpTool] attribute
  8. generate_prd[McpTool] attribute
  9. split_epic[McpTool] attribute
  10. detect_risks[McpTool] attribute

Migration Pattern:

// BEFORE (Custom)
public class CreateIssueTool : IMcpTool
{
    public string Name => "create_issue";
    public string Description => "Create a new issue";
    public McpToolInputSchema InputSchema => ...;

    public async Task<McpToolResult> ExecuteAsync(...)
    {
        // Custom routing logic
    }
}

// AFTER (SDK)
[McpTool(
    Name = "create_issue",
    Description = "Create a new issue (Epic/Story/Task)"
)]
public class CreateIssueTool
{
    [McpToolParameter(Required = true)]
    public Guid ProjectId { get; set; }

    [McpToolParameter(Required = true)]
    public string Title { get; set; }

    [McpToolParameter]
    public string? Description { get; set; }

    public async Task<McpToolResult> ExecuteAsync(
        McpContext context,
        CancellationToken cancellationToken)
    {
        // Business logic stays the same
        // DiffPreviewService integration preserved
    }
}

Deliverables:

  • All 10 Tools migrated to SDK attributes
  • DiffPreviewService integration maintained
  • Integration tests updated
  • Performance comparison report

Acceptance Criteria:

  • All Tools work with [McpTool] attribute
  • Diff Preview workflow preserved (no breaking changes)
  • Integration tests pass (>80% coverage)
  • Performance improvement measured (target: 20%+)

Phase 3: Resource Migration (Week 5) - Story 15

Goal: Migrate all 11 Resources to SDK attribute-based registration

Resources to Migrate:

  1. projects.list[McpResource]
  2. projects.get/{id}[McpResource]
  3. issues.search[McpResource]
  4. issues.get/{id}[McpResource]
  5. sprints.current[McpResource]
  6. sprints.list[McpResource]
  7. users.list[McpResource]
  8. docs.prd/{projectId}[McpResource]
  9. reports.daily/{date}[McpResource]
  10. reports.velocity[McpResource]
  11. audit.history/{entityId}[McpResource]

Migration Pattern:

// BEFORE (Custom)
public class ProjectsListResource : IMcpResource
{
    public string Uri => "colaflow://projects.list";
    public string Name => "Projects List";

    public async Task<McpResourceContent> GetContentAsync(...)
    {
        // Custom logic
    }
}

// AFTER (SDK)
[McpResource(
    Uri = "colaflow://projects.list",
    Name = "Projects List",
    Description = "List all projects in current tenant",
    MimeType = "application/json"
)]
public class ProjectsListResource
{
    private readonly IProjectRepository _repo;
    private readonly ITenantContext _tenant;
    private readonly IMemoryCache _cache; // Redis preserved

    public async Task<McpResourceContent> GetContentAsync(
        McpContext context,
        CancellationToken cancellationToken)
    {
        // Business logic stays the same
        // Multi-tenant filtering preserved
        // Redis caching preserved
    }
}

Deliverables:

  • All 11 Resources migrated to SDK attributes
  • Multi-tenant isolation verified
  • Redis caching maintained
  • Performance tests passed

Acceptance Criteria:

  • All Resources work with [McpResource] attribute
  • Multi-tenant isolation 100% verified
  • Redis cache hit rate > 80% maintained
  • Response time < 200ms (P95)

Phase 4: Transport Layer (Week 6) - Story 16

Goal: Replace custom HTTP middleware with SDK transport

Current Custom Transport:

// Custom middleware (will be removed)
app.UseMiddleware<McpProtocolMiddleware>();
app.UseMiddleware<ApiKeyAuthMiddleware>();

SDK Transport Configuration:

// SDK-based transport
builder.Services.AddMcpServer(options =>
{
    // stdio transport (for CLI tools like Claude Desktop)
    options.UseStdioTransport();

    // HTTP/SSE transport (for web-based clients)
    options.UseHttpTransport(http =>
    {
        http.BasePath = "/mcp";
        http.EnableSse = true; // Server-Sent Events
    });

    // Custom authentication (preserve API Key auth)
    options.AddAuthentication<ApiKeyAuthHandler>();

    // Custom authorization (preserve field-level permissions)
    options.AddAuthorization<FieldLevelAuthHandler>();
});

Deliverables:

  • Custom middleware removed
  • SDK transport configured (stdio + HTTP/SSE)
  • API Key authentication migrated to SDK pipeline
  • Field-level permissions preserved

Acceptance Criteria:

  • stdio transport works (Claude Desktop compatibility)
  • HTTP/SSE transport works (web client compatibility)
  • API Key authentication functional
  • Field-level permissions enforced
  • Zero breaking changes for existing clients

Phase 5: Testing & Optimization (Week 7-8) - Story 17

Goal: Comprehensive testing, performance tuning, and documentation

Week 7: Integration Testing & Bug Fixes

Tasks:

  1. End-to-End Testing

    • Claude Desktop integration test (stdio)
    • Web client integration test (HTTP/SSE)
    • Multi-tenant isolation verification
    • Diff Preview workflow validation
  2. Performance Testing

    • Benchmark Tools (target: 20%+ improvement)
    • Benchmark Resources (target: 30%+ improvement)
    • Concurrent request testing (100 req/s)
    • Memory usage profiling
  3. Security Audit

    • API Key brute force test
    • Cross-tenant access attempts
    • Field-level permission bypass tests
    • SQL injection attempts
  4. Bug Fixes

    • Fix integration test failures
    • Address performance bottlenecks
    • Fix security vulnerabilities (if found)

Week 8: Documentation & Production Readiness

Tasks:

  1. Architecture Documentation

    • Update mcp-server-architecture.md with SDK details
    • Create SDK migration guide for developers
    • Document hybrid architecture decisions
    • Add troubleshooting guide
  2. API Documentation

    • Update OpenAPI/Swagger specs
    • Document Tool parameter schemas
    • Document Resource URI patterns
    • Add example requests/responses
  3. Code Cleanup

    • Remove old custom protocol code
    • Delete obsolete interfaces (IMcpTool, IMcpResource)
    • Clean up unused NuGet packages
    • Update code comments
  4. Production Readiness

    • Deploy to staging environment
    • Smoke testing with real AI clients
    • Performance validation
    • Final code review

Deliverables:

  • Comprehensive test suite (>80% coverage)
  • Performance report (vs. baseline)
  • Security audit report (zero CRITICAL issues)
  • Updated architecture documentation
  • Production deployment guide

Acceptance Criteria:

  • Integration tests pass (>80% coverage)
  • Performance improved by ≥20%
  • Security audit clean (0 CRITICAL, 0 HIGH)
  • Documentation complete and reviewed
  • Production-ready checklist signed off

Stories Breakdown

This Epic is broken down into 5 child Stories:

  • Story 13 - MCP SDK Foundation & PoC (Week 1-2) - not_started
  • Story 14 - Tool Migration to SDK (Week 3-4) - not_started
  • Story 15 - Resource Migration to SDK (Week 5) - not_started
  • Story 16 - Transport Layer Migration (Week 6) - not_started
  • Story 17 - Testing & Optimization (Week 7-8) - not_started

Progress: 0/5 stories completed (0%)

Risk Assessment

High-Priority Risks

Risk ID Description Impact Probability Mitigation
RISK-001 SDK breaking changes during migration High Low Lock SDK version, gradual migration
RISK-002 Performance regression High Medium Continuous benchmarking, rollback plan
RISK-003 DiffPreview integration conflicts Medium Medium Thorough testing, preserve interfaces
RISK-004 Client compatibility issues High Low Test with Claude Desktop early
RISK-005 Multi-tenant isolation bugs Critical Very Low 100% test coverage, security audit

Mitigation Strategies

  1. Phased Migration: 5 phases allow early detection of issues
  2. Parallel Systems: Keep old code until SDK fully validated
  3. Feature Flags: Enable/disable SDK via configuration
  4. Rollback Plan: Can revert to custom implementation if needed
  5. Continuous Testing: Run tests after each phase

Dependencies

Prerequisites

  • Sprint 5 Phase 1-3 infrastructure (Stories 1-12)
  • Custom MCP implementation complete and working
  • DiffPreview service production-ready
  • Multi-tenant security verified

External Dependencies

  • Microsoft .NET MCP SDK v1.0+ (NuGet)
  • Claude Desktop 1.0+ (for testing)
  • Continue VS Code Extension (for testing)

Technical Requirements

  • .NET 9+ (already installed)
  • PostgreSQL 15+ (already configured)
  • Redis 7+ (already configured)

Acceptance Criteria (Epic-Level)

Functional Requirements

  • All 10 Tools migrated to SDK [McpTool] attributes
  • All 11 Resources migrated to SDK [McpResource] attributes
  • stdio transport works (Claude Desktop compatible)
  • HTTP/SSE transport works (web client compatible)
  • Diff Preview workflow preserved (no breaking changes)
  • Multi-tenant isolation 100% verified
  • API Key authentication functional
  • Field-level permissions enforced

Performance Requirements

  • Response time improved by ≥20%
  • Tool execution time < 500ms (P95)
  • Resource query time < 200ms (P95)
  • Throughput ≥100 requests/second
  • Memory usage optimized (no leaks)

Quality Requirements

  • Test coverage ≥80%
  • Zero CRITICAL security vulnerabilities
  • Zero HIGH security vulnerabilities
  • Code duplication <5%
  • All integration tests pass

Documentation Requirements

  • Architecture documentation updated
  • API documentation complete
  • Migration guide published
  • Troubleshooting guide published
  • Code examples updated

Success Metrics

Code Quality

  • Lines Removed: 500-700 lines of custom protocol code
  • Code Duplication: <5%
  • Test Coverage: ≥80%
  • Security Score: 0 CRITICAL, 0 HIGH vulnerabilities

Performance

  • Response Time: 20-40% improvement
  • Throughput: 100+ req/s (from 70 req/s)
  • Memory Usage: 10-20% reduction
  • Cache Hit Rate: >80% maintained

Developer Experience

  • Onboarding Time: 50% faster (simpler SDK APIs)
  • Code Readability: +30% (attributes vs. manual registration)
  • Maintenance Effort: -60% (Microsoft maintains protocol)

Research & Design

Sprint Planning

Technical References


Notes

Why Hybrid Architecture?

Question: Why not use 100% SDK?

Answer: ColaFlow has unique business requirements:

  1. Diff Preview: SDK doesn't provide preview mechanism (ColaFlow custom)
  2. Approval Workflow: SDK doesn't have human-in-the-loop (ColaFlow custom)
  3. Multi-Tenant: SDK doesn't enforce tenant isolation (ColaFlow custom)
  4. Field Permissions: SDK doesn't have field-level security (ColaFlow custom)

Hybrid approach gets best of both worlds:

  • SDK handles boring protocol stuff (60-70% code reduction)
  • ColaFlow handles business-critical stuff (security, approval)

What Gets Deleted?

Custom Code to Remove (~700 lines):

  • McpProtocolHandler.cs (JSON-RPC parsing)
  • McpProtocolMiddleware.cs (HTTP middleware)
  • IMcpTool.cs interface (replaced by SDK attributes)
  • IMcpResource.cs interface (replaced by SDK attributes)
  • McpRegistry.cs (replaced by SDK discovery)
  • McpRequest.cs / McpResponse.cs DTOs (SDK provides)

Custom Code to Keep (~1200 lines):

  • DiffPreviewService.cs (business logic)
  • PendingChangeService.cs (approval workflow)
  • ApiKeyAuthHandler.cs (security)
  • FieldLevelAuthHandler.cs (permissions)
  • TenantContextService.cs (multi-tenant)

Timeline Justification

Why 8 weeks?

  • Week 1-2: PoC + training (can't rush, need to understand SDK)
  • Week 3-4: 10 Tools migration (careful testing required)
  • Week 5: 11 Resources migration (simpler than Tools)
  • Week 6: Transport layer (critical, can't break clients)
  • Week 7-8: Testing + docs (quality gate, can't skip)

Could it be faster?

  • Yes, if we skip testing (NOT RECOMMENDED)
  • Yes, if we accept higher risk (NOT RECOMMENDED)
  • This is already aggressive timeline (1.6 weeks per phase)

Post-Migration Benefits

Developer Velocity:

  • New Tool creation: 30 min (was 2 hours)
  • New Resource creation: 15 min (was 1 hour)
  • Onboarding new developers: 2 days (was 5 days)

Maintenance Burden:

  • Protocol updates: 0 hours (Microsoft handles)
  • Bug fixes: -60% effort (less custom code)
  • Feature additions: +40% faster (SDK simplifies)

Created: 2025-11-09 by Product Manager Agent Epic Owner: Backend Team Lead Estimated Start: 2025-11-27 (After Sprint 5 Phase 1-3) Estimated Completion: 2026-01-22 (Week 8 of Sprint 5) Status: Not Started (planning complete)