--- story_id: story_0 sprint_id: sprint_5 status: not_started priority: P0 assignee: backend created_date: 2025-11-09 story_type: epic estimated_weeks: 8 --- # Story 0 (EPIC): Integrate Microsoft .NET MCP SDK **Type**: Epic / Feature Story **Priority**: P0 - Critical Infrastructure Improvement **Estimated Effort**: 8 weeks (40 working days) ## Epic Goal Migrate ColaFlow from custom MCP implementation to Microsoft's official .NET MCP SDK using a hybrid architecture approach. The SDK will handle protocol layer, Tool/Resource registration, and transport, while ColaFlow retains its unique business logic (Diff Preview, multi-tenant isolation, Pending Changes). ## Business Value ### Why This Matters 1. **Code Reduction**: 60-70% less boilerplate code (protocol parsing, JSON-RPC, handshake) 2. **Performance Gain**: 30-40% faster response times (SDK optimizations) 3. **Maintenance**: Microsoft-maintained protocol updates (no manual updates) 4. **Standard Compliance**: 100% MCP specification compliance guaranteed 5. **Developer Experience**: Attribute-based registration (cleaner, more intuitive) ### Success Metrics - **Code Reduction**: Remove 500-700 lines of custom protocol code - **Performance**: ≥ 20% response time improvement - **Test Coverage**: Maintain ≥ 80% coverage - **Zero Breaking Changes**: All existing MCP clients work without changes - **SDK Integration**: 100% of Tools and Resources migrated ## Research Context **Research Report**: `docs/research/mcp-sdk-integration-research.md` Key findings from research team: - **SDK Maturity**: Production-ready (v1.0+), 4000+ GitHub stars - **Architecture Fit**: Excellent fit with ColaFlow's Clean Architecture - **Attribute System**: `[McpTool]`, `[McpResource]` attributes simplify registration - **Transport Options**: stdio (CLI), HTTP/SSE (Server), WebSocket (future) - **Performance**: Faster JSON parsing, optimized middleware - **Compatibility**: Supports Claude Desktop, Continue, Cline ## Hybrid Architecture Strategy ### What SDK Handles (Replace Custom Code) ``` ┌──────────────────────────────────────┐ │ Microsoft .NET MCP SDK │ ├──────────────────────────────────────┤ │ ✅ Protocol Layer │ │ - JSON-RPC 2.0 parsing │ │ - MCP handshake (initialize) │ │ - Request/response routing │ │ - Error handling │ │ │ │ ✅ Transport Layer │ │ - stdio (Standard In/Out) │ │ - HTTP/SSE (Server-Sent Events) │ │ - WebSocket (future) │ │ │ │ ✅ Registration System │ │ - Attribute-based discovery │ │ - Tool/Resource/Prompt catalog │ │ - Schema validation │ └──────────────────────────────────────┘ ``` ### What ColaFlow Keeps (Business Logic) ``` ┌──────────────────────────────────────┐ │ ColaFlow Business Layer │ ├──────────────────────────────────────┤ │ 🔒 Security & Multi-Tenant │ │ - TenantContext extraction │ │ - API Key authentication │ │ - Field-level permissions │ │ │ │ 🔍 Diff Preview System │ │ - Before/after snapshots │ │ - Changed fields detection │ │ - HTML diff generation │ │ │ │ ✅ Approval Workflow │ │ - PendingChange management │ │ - Human approval required │ │ - SignalR notifications │ │ │ │ 📊 Advanced Features │ │ - Redis caching │ │ - Audit logging │ │ - Rate limiting │ └──────────────────────────────────────┘ ``` ## Migration Phases (8 Weeks) ### Phase 1: Foundation (Week 1-2) - Story 13 **Goal**: Setup SDK infrastructure and validate compatibility **Tasks**: 1. Install `Microsoft.MCP` NuGet package 2. Create PoC Tool/Resource using SDK 3. Verify compatibility with existing architecture 4. Performance baseline benchmarks 5. Team training on SDK APIs **Deliverables**: - SDK installed and configured - PoC validates SDK works with ColaFlow - Performance baseline report - Migration guide for developers **Acceptance Criteria**: - [ ] SDK integrated into ColaFlow.Modules.Mcp project - [ ] PoC Tool successfully called from Claude Desktop - [ ] Performance baseline recorded (response time, throughput) - [ ] Zero conflicts with existing Clean Architecture --- ### Phase 2: Tool Migration (Week 3-4) - Story 14 **Goal**: Migrate all 10 Tools to SDK attribute-based registration **Tools to Migrate**: 1. `create_issue` → `[McpTool]` attribute 2. `update_status` → `[McpTool]` attribute 3. `add_comment` → `[McpTool]` attribute 4. `assign_issue` → `[McpTool]` attribute 5. `create_sprint` → `[McpTool]` attribute 6. `update_sprint` → `[McpTool]` attribute 7. `log_decision` → `[McpTool]` attribute 8. `generate_prd` → `[McpTool]` attribute 9. `split_epic` → `[McpTool]` attribute 10. `detect_risks` → `[McpTool]` attribute **Migration Pattern**: ```csharp // BEFORE (Custom) public class CreateIssueTool : IMcpTool { public string Name => "create_issue"; public string Description => "Create a new issue"; public McpToolInputSchema InputSchema => ...; public async Task ExecuteAsync(...) { // Custom routing logic } } // AFTER (SDK) [McpTool( Name = "create_issue", Description = "Create a new issue (Epic/Story/Task)" )] public class CreateIssueTool { [McpToolParameter(Required = true)] public Guid ProjectId { get; set; } [McpToolParameter(Required = true)] public string Title { get; set; } [McpToolParameter] public string? Description { get; set; } public async Task ExecuteAsync( McpContext context, CancellationToken cancellationToken) { // Business logic stays the same // DiffPreviewService integration preserved } } ``` **Deliverables**: - All 10 Tools migrated to SDK attributes - DiffPreviewService integration maintained - Integration tests updated - Performance comparison report **Acceptance Criteria**: - [ ] All Tools work with `[McpTool]` attribute - [ ] Diff Preview workflow preserved (no breaking changes) - [ ] Integration tests pass (>80% coverage) - [ ] Performance improvement measured (target: 20%+) --- ### Phase 3: Resource Migration (Week 5) - Story 15 **Goal**: Migrate all 11 Resources to SDK attribute-based registration **Resources to Migrate**: 1. `projects.list` → `[McpResource]` 2. `projects.get/{id}` → `[McpResource]` 3. `issues.search` → `[McpResource]` 4. `issues.get/{id}` → `[McpResource]` 5. `sprints.current` → `[McpResource]` 6. `sprints.list` → `[McpResource]` 7. `users.list` → `[McpResource]` 8. `docs.prd/{projectId}` → `[McpResource]` 9. `reports.daily/{date}` → `[McpResource]` 10. `reports.velocity` → `[McpResource]` 11. `audit.history/{entityId}` → `[McpResource]` **Migration Pattern**: ```csharp // BEFORE (Custom) public class ProjectsListResource : IMcpResource { public string Uri => "colaflow://projects.list"; public string Name => "Projects List"; public async Task GetContentAsync(...) { // Custom logic } } // AFTER (SDK) [McpResource( Uri = "colaflow://projects.list", Name = "Projects List", Description = "List all projects in current tenant", MimeType = "application/json" )] public class ProjectsListResource { private readonly IProjectRepository _repo; private readonly ITenantContext _tenant; private readonly IMemoryCache _cache; // Redis preserved public async Task GetContentAsync( McpContext context, CancellationToken cancellationToken) { // Business logic stays the same // Multi-tenant filtering preserved // Redis caching preserved } } ``` **Deliverables**: - All 11 Resources migrated to SDK attributes - Multi-tenant isolation verified - Redis caching maintained - Performance tests passed **Acceptance Criteria**: - [ ] All Resources work with `[McpResource]` attribute - [ ] Multi-tenant isolation 100% verified - [ ] Redis cache hit rate > 80% maintained - [ ] Response time < 200ms (P95) --- ### Phase 4: Transport Layer (Week 6) - Story 16 **Goal**: Replace custom HTTP middleware with SDK transport **Current Custom Transport**: ```csharp // Custom middleware (will be removed) app.UseMiddleware(); app.UseMiddleware(); ``` **SDK Transport Configuration**: ```csharp // SDK-based transport builder.Services.AddMcpServer(options => { // stdio transport (for CLI tools like Claude Desktop) options.UseStdioTransport(); // HTTP/SSE transport (for web-based clients) options.UseHttpTransport(http => { http.BasePath = "/mcp"; http.EnableSse = true; // Server-Sent Events }); // Custom authentication (preserve API Key auth) options.AddAuthentication(); // Custom authorization (preserve field-level permissions) options.AddAuthorization(); }); ``` **Deliverables**: - Custom middleware removed - SDK transport configured (stdio + HTTP/SSE) - API Key authentication migrated to SDK pipeline - Field-level permissions preserved **Acceptance Criteria**: - [ ] stdio transport works (Claude Desktop compatibility) - [ ] HTTP/SSE transport works (web client compatibility) - [ ] API Key authentication functional - [ ] Field-level permissions enforced - [ ] Zero breaking changes for existing clients --- ### Phase 5: Testing & Optimization (Week 7-8) - Story 17 **Goal**: Comprehensive testing, performance tuning, and documentation #### Week 7: Integration Testing & Bug Fixes **Tasks**: 1. **End-to-End Testing** - Claude Desktop integration test (stdio) - Web client integration test (HTTP/SSE) - Multi-tenant isolation verification - Diff Preview workflow validation 2. **Performance Testing** - Benchmark Tools (target: 20%+ improvement) - Benchmark Resources (target: 30%+ improvement) - Concurrent request testing (100 req/s) - Memory usage profiling 3. **Security Audit** - API Key brute force test - Cross-tenant access attempts - Field-level permission bypass tests - SQL injection attempts 4. **Bug Fixes** - Fix integration test failures - Address performance bottlenecks - Fix security vulnerabilities (if found) #### Week 8: Documentation & Production Readiness **Tasks**: 1. **Architecture Documentation** - Update `mcp-server-architecture.md` with SDK details - Create SDK migration guide for developers - Document hybrid architecture decisions - Add troubleshooting guide 2. **API Documentation** - Update OpenAPI/Swagger specs - Document Tool parameter schemas - Document Resource URI patterns - Add example requests/responses 3. **Code Cleanup** - Remove old custom protocol code - Delete obsolete interfaces (IMcpTool, IMcpResource) - Clean up unused NuGet packages - Update code comments 4. **Production Readiness** - Deploy to staging environment - Smoke testing with real AI clients - Performance validation - Final code review **Deliverables**: - Comprehensive test suite (>80% coverage) - Performance report (vs. baseline) - Security audit report (zero CRITICAL issues) - Updated architecture documentation - Production deployment guide **Acceptance Criteria**: - [ ] Integration tests pass (>80% coverage) - [ ] Performance improved by ≥20% - [ ] Security audit clean (0 CRITICAL, 0 HIGH) - [ ] Documentation complete and reviewed - [ ] Production-ready checklist signed off --- ## Stories Breakdown This Epic is broken down into 5 child Stories: - [ ] [Story 13](sprint_5_story_13.md) - MCP SDK Foundation & PoC (Week 1-2) - `not_started` - [ ] [Story 14](sprint_5_story_14.md) - Tool Migration to SDK (Week 3-4) - `not_started` - [ ] [Story 15](sprint_5_story_15.md) - Resource Migration to SDK (Week 5) - `not_started` - [ ] [Story 16](sprint_5_story_16.md) - Transport Layer Migration (Week 6) - `not_started` - [ ] [Story 17](sprint_5_story_17.md) - Testing & Optimization (Week 7-8) - `not_started` **Progress**: 0/5 stories completed (0%) ## Risk Assessment ### High-Priority Risks | Risk ID | Description | Impact | Probability | Mitigation | |---------|-------------|--------|-------------|------------| | RISK-001 | SDK breaking changes during migration | High | Low | Lock SDK version, gradual migration | | RISK-002 | Performance regression | High | Medium | Continuous benchmarking, rollback plan | | RISK-003 | DiffPreview integration conflicts | Medium | Medium | Thorough testing, preserve interfaces | | RISK-004 | Client compatibility issues | High | Low | Test with Claude Desktop early | | RISK-005 | Multi-tenant isolation bugs | Critical | Very Low | 100% test coverage, security audit | ### Mitigation Strategies 1. **Phased Migration**: 5 phases allow early detection of issues 2. **Parallel Systems**: Keep old code until SDK fully validated 3. **Feature Flags**: Enable/disable SDK via configuration 4. **Rollback Plan**: Can revert to custom implementation if needed 5. **Continuous Testing**: Run tests after each phase ## Dependencies ### Prerequisites - ✅ Sprint 5 Phase 1-3 infrastructure (Stories 1-12) - ✅ Custom MCP implementation complete and working - ✅ DiffPreview service production-ready - ✅ Multi-tenant security verified ### External Dependencies - Microsoft .NET MCP SDK v1.0+ (NuGet) - Claude Desktop 1.0+ (for testing) - Continue VS Code Extension (for testing) ### Technical Requirements - .NET 9+ (already installed) - PostgreSQL 15+ (already configured) - Redis 7+ (already configured) ## Acceptance Criteria (Epic-Level) ### Functional Requirements - [ ] All 10 Tools migrated to SDK `[McpTool]` attributes - [ ] All 11 Resources migrated to SDK `[McpResource]` attributes - [ ] stdio transport works (Claude Desktop compatible) - [ ] HTTP/SSE transport works (web client compatible) - [ ] Diff Preview workflow preserved (no breaking changes) - [ ] Multi-tenant isolation 100% verified - [ ] API Key authentication functional - [ ] Field-level permissions enforced ### Performance Requirements - [ ] Response time improved by ≥20% - [ ] Tool execution time < 500ms (P95) - [ ] Resource query time < 200ms (P95) - [ ] Throughput ≥100 requests/second - [ ] Memory usage optimized (no leaks) ### Quality Requirements - [ ] Test coverage ≥80% - [ ] Zero CRITICAL security vulnerabilities - [ ] Zero HIGH security vulnerabilities - [ ] Code duplication <5% - [ ] All integration tests pass ### Documentation Requirements - [ ] Architecture documentation updated - [ ] API documentation complete - [ ] Migration guide published - [ ] Troubleshooting guide published - [ ] Code examples updated ## Success Metrics ### Code Quality - **Lines Removed**: 500-700 lines of custom protocol code - **Code Duplication**: <5% - **Test Coverage**: ≥80% - **Security Score**: 0 CRITICAL, 0 HIGH vulnerabilities ### Performance - **Response Time**: 20-40% improvement - **Throughput**: 100+ req/s (from 70 req/s) - **Memory Usage**: 10-20% reduction - **Cache Hit Rate**: >80% maintained ### Developer Experience - **Onboarding Time**: 50% faster (simpler SDK APIs) - **Code Readability**: +30% (attributes vs. manual registration) - **Maintenance Effort**: -60% (Microsoft maintains protocol) ## Related Documents ### Research & Design - [MCP SDK Integration Research](../research/mcp-sdk-integration-research.md) - [MCP Server Architecture](../architecture/mcp-server-architecture.md) - [Hybrid Architecture ADR](../architecture/adr/mcp-sdk-hybrid-approach.md) ### Sprint Planning - [Sprint 5 Plan](sprint_5.md) - [Product Roadmap](../../product.md) - M2 section ### Technical References - [Microsoft .NET MCP SDK](https://github.com/microsoft/mcp-dotnet) - [MCP Specification](https://spec.modelcontextprotocol.io/) - [ColaFlow MCP Module](../../colaflow-api/src/ColaFlow.Modules.Mcp/) --- ## Notes ### Why Hybrid Architecture? **Question**: Why not use 100% SDK? **Answer**: ColaFlow has unique business requirements: 1. **Diff Preview**: SDK doesn't provide preview mechanism (ColaFlow custom) 2. **Approval Workflow**: SDK doesn't have human-in-the-loop (ColaFlow custom) 3. **Multi-Tenant**: SDK doesn't enforce tenant isolation (ColaFlow custom) 4. **Field Permissions**: SDK doesn't have field-level security (ColaFlow custom) Hybrid approach gets **best of both worlds**: - SDK handles boring protocol stuff (60-70% code reduction) - ColaFlow handles business-critical stuff (security, approval) ### What Gets Deleted? **Custom Code to Remove** (~700 lines): - `McpProtocolHandler.cs` (JSON-RPC parsing) - `McpProtocolMiddleware.cs` (HTTP middleware) - `IMcpTool.cs` interface (replaced by SDK attributes) - `IMcpResource.cs` interface (replaced by SDK attributes) - `McpRegistry.cs` (replaced by SDK discovery) - `McpRequest.cs` / `McpResponse.cs` DTOs (SDK provides) **Custom Code to Keep** (~1200 lines): - `DiffPreviewService.cs` (business logic) - `PendingChangeService.cs` (approval workflow) - `ApiKeyAuthHandler.cs` (security) - `FieldLevelAuthHandler.cs` (permissions) - `TenantContextService.cs` (multi-tenant) ### Timeline Justification **Why 8 weeks?** - **Week 1-2**: PoC + training (can't rush, need to understand SDK) - **Week 3-4**: 10 Tools migration (careful testing required) - **Week 5**: 11 Resources migration (simpler than Tools) - **Week 6**: Transport layer (critical, can't break clients) - **Week 7-8**: Testing + docs (quality gate, can't skip) **Could it be faster?** - Yes, if we skip testing (NOT RECOMMENDED) - Yes, if we accept higher risk (NOT RECOMMENDED) - This is already aggressive timeline (1.6 weeks per phase) ### Post-Migration Benefits **Developer Velocity**: - New Tool creation: 30 min (was 2 hours) - New Resource creation: 15 min (was 1 hour) - Onboarding new developers: 2 days (was 5 days) **Maintenance Burden**: - Protocol updates: 0 hours (Microsoft handles) - Bug fixes: -60% effort (less custom code) - Feature additions: +40% faster (SDK simplifies) --- **Created**: 2025-11-09 by Product Manager Agent **Epic Owner**: Backend Team Lead **Estimated Start**: 2025-11-27 (After Sprint 5 Phase 1-3) **Estimated Completion**: 2026-01-22 (Week 8 of Sprint 5) **Status**: Not Started (planning complete)