Files
ColaFlow/docs/plans/sprint_5_story_0.md
Yaojia Wang 34a379750f
Some checks failed
Code Coverage / Generate Coverage Report (push) Has been cancelled
Tests / Run Tests (9.0.x) (push) Has been cancelled
Tests / Docker Build Test (push) Has been cancelled
Tests / Test Summary (push) Has been cancelled
Clean up
2025-11-15 08:58:48 +01:00

581 lines
19 KiB
Markdown

---
story_id: story_0
sprint_id: sprint_5
status: not_started
priority: P0
assignee: backend
created_date: 2025-11-09
story_type: epic
estimated_weeks: 8
---
# Story 0 (EPIC): Integrate Microsoft .NET MCP SDK
**Type**: Epic / Feature Story
**Priority**: P0 - Critical Infrastructure Improvement
**Estimated Effort**: 8 weeks (40 working days)
## Epic Goal
Migrate ColaFlow from custom MCP implementation to Microsoft's official .NET MCP SDK using a hybrid architecture approach. The SDK will handle protocol layer, Tool/Resource registration, and transport, while ColaFlow retains its unique business logic (Diff Preview, multi-tenant isolation, Pending Changes).
## Business Value
### Why This Matters
1. **Code Reduction**: 60-70% less boilerplate code (protocol parsing, JSON-RPC, handshake)
2. **Performance Gain**: 30-40% faster response times (SDK optimizations)
3. **Maintenance**: Microsoft-maintained protocol updates (no manual updates)
4. **Standard Compliance**: 100% MCP specification compliance guaranteed
5. **Developer Experience**: Attribute-based registration (cleaner, more intuitive)
### Success Metrics
- **Code Reduction**: Remove 500-700 lines of custom protocol code
- **Performance**: ≥ 20% response time improvement
- **Test Coverage**: Maintain ≥ 80% coverage
- **Zero Breaking Changes**: All existing MCP clients work without changes
- **SDK Integration**: 100% of Tools and Resources migrated
## Research Context
**Research Report**: `docs/research/mcp-sdk-integration-research.md`
Key findings from research team:
- **SDK Maturity**: Production-ready (v1.0+), 4000+ GitHub stars
- **Architecture Fit**: Excellent fit with ColaFlow's Clean Architecture
- **Attribute System**: `[McpTool]`, `[McpResource]` attributes simplify registration
- **Transport Options**: stdio (CLI), HTTP/SSE (Server), WebSocket (future)
- **Performance**: Faster JSON parsing, optimized middleware
- **Compatibility**: Supports Claude Desktop, Continue, Cline
## Hybrid Architecture Strategy
### What SDK Handles (Replace Custom Code)
```
┌──────────────────────────────────────┐
│ Microsoft .NET MCP SDK │
├──────────────────────────────────────┤
│ ✅ Protocol Layer │
│ - JSON-RPC 2.0 parsing │
│ - MCP handshake (initialize) │
│ - Request/response routing │
│ - Error handling │
│ │
│ ✅ Transport Layer │
│ - stdio (Standard In/Out) │
│ - HTTP/SSE (Server-Sent Events) │
│ - WebSocket (future) │
│ │
│ ✅ Registration System │
│ - Attribute-based discovery │
│ - Tool/Resource/Prompt catalog │
│ - Schema validation │
└──────────────────────────────────────┘
```
### What ColaFlow Keeps (Business Logic)
```
┌──────────────────────────────────────┐
│ ColaFlow Business Layer │
├──────────────────────────────────────┤
│ 🔒 Security & Multi-Tenant │
│ - TenantContext extraction │
│ - API Key authentication │
│ - Field-level permissions │
│ │
│ 🔍 Diff Preview System │
│ - Before/after snapshots │
│ - Changed fields detection │
│ - HTML diff generation │
│ │
│ ✅ Approval Workflow │
│ - PendingChange management │
│ - Human approval required │
│ - SignalR notifications │
│ │
│ 📊 Advanced Features │
│ - Redis caching │
│ - Audit logging │
│ - Rate limiting │
└──────────────────────────────────────┘
```
## Migration Phases (8 Weeks)
### Phase 1: Foundation (Week 1-2) - Story 13
**Goal**: Setup SDK infrastructure and validate compatibility
**Tasks**:
1. Install `Microsoft.MCP` NuGet package
2. Create PoC Tool/Resource using SDK
3. Verify compatibility with existing architecture
4. Performance baseline benchmarks
5. Team training on SDK APIs
**Deliverables**:
- SDK installed and configured
- PoC validates SDK works with ColaFlow
- Performance baseline report
- Migration guide for developers
**Acceptance Criteria**:
- [ ] SDK integrated into ColaFlow.Modules.Mcp project
- [ ] PoC Tool successfully called from Claude Desktop
- [ ] Performance baseline recorded (response time, throughput)
- [ ] Zero conflicts with existing Clean Architecture
---
### Phase 2: Tool Migration (Week 3-4) - Story 14
**Goal**: Migrate all 10 Tools to SDK attribute-based registration
**Tools to Migrate**:
1. `create_issue``[McpTool]` attribute
2. `update_status``[McpTool]` attribute
3. `add_comment``[McpTool]` attribute
4. `assign_issue``[McpTool]` attribute
5. `create_sprint``[McpTool]` attribute
6. `update_sprint``[McpTool]` attribute
7. `log_decision``[McpTool]` attribute
8. `generate_prd``[McpTool]` attribute
9. `split_epic``[McpTool]` attribute
10. `detect_risks``[McpTool]` attribute
**Migration Pattern**:
```csharp
// BEFORE (Custom)
public class CreateIssueTool : IMcpTool
{
public string Name => "create_issue";
public string Description => "Create a new issue";
public McpToolInputSchema InputSchema => ...;
public async Task<McpToolResult> ExecuteAsync(...)
{
// Custom routing logic
}
}
// AFTER (SDK)
[McpTool(
Name = "create_issue",
Description = "Create a new issue (Epic/Story/Task)"
)]
public class CreateIssueTool
{
[McpToolParameter(Required = true)]
public Guid ProjectId { get; set; }
[McpToolParameter(Required = true)]
public string Title { get; set; }
[McpToolParameter]
public string? Description { get; set; }
public async Task<McpToolResult> ExecuteAsync(
McpContext context,
CancellationToken cancellationToken)
{
// Business logic stays the same
// DiffPreviewService integration preserved
}
}
```
**Deliverables**:
- All 10 Tools migrated to SDK attributes
- DiffPreviewService integration maintained
- Integration tests updated
- Performance comparison report
**Acceptance Criteria**:
- [ ] All Tools work with `[McpTool]` attribute
- [ ] Diff Preview workflow preserved (no breaking changes)
- [ ] Integration tests pass (>80% coverage)
- [ ] Performance improvement measured (target: 20%+)
---
### Phase 3: Resource Migration (Week 5) - Story 15
**Goal**: Migrate all 11 Resources to SDK attribute-based registration
**Resources to Migrate**:
1. `projects.list``[McpResource]`
2. `projects.get/{id}``[McpResource]`
3. `issues.search``[McpResource]`
4. `issues.get/{id}``[McpResource]`
5. `sprints.current``[McpResource]`
6. `sprints.list``[McpResource]`
7. `users.list``[McpResource]`
8. `docs.prd/{projectId}``[McpResource]`
9. `reports.daily/{date}``[McpResource]`
10. `reports.velocity``[McpResource]`
11. `audit.history/{entityId}``[McpResource]`
**Migration Pattern**:
```csharp
// BEFORE (Custom)
public class ProjectsListResource : IMcpResource
{
public string Uri => "colaflow://projects.list";
public string Name => "Projects List";
public async Task<McpResourceContent> GetContentAsync(...)
{
// Custom logic
}
}
// AFTER (SDK)
[McpResource(
Uri = "colaflow://projects.list",
Name = "Projects List",
Description = "List all projects in current tenant",
MimeType = "application/json"
)]
public class ProjectsListResource
{
private readonly IProjectRepository _repo;
private readonly ITenantContext _tenant;
private readonly IMemoryCache _cache; // Redis preserved
public async Task<McpResourceContent> GetContentAsync(
McpContext context,
CancellationToken cancellationToken)
{
// Business logic stays the same
// Multi-tenant filtering preserved
// Redis caching preserved
}
}
```
**Deliverables**:
- All 11 Resources migrated to SDK attributes
- Multi-tenant isolation verified
- Redis caching maintained
- Performance tests passed
**Acceptance Criteria**:
- [ ] All Resources work with `[McpResource]` attribute
- [ ] Multi-tenant isolation 100% verified
- [ ] Redis cache hit rate > 80% maintained
- [ ] Response time < 200ms (P95)
---
### Phase 4: Transport Layer (Week 6) - Story 16
**Goal**: Replace custom HTTP middleware with SDK transport
**Current Custom Transport**:
```csharp
// Custom middleware (will be removed)
app.UseMiddleware<McpProtocolMiddleware>();
app.UseMiddleware<ApiKeyAuthMiddleware>();
```
**SDK Transport Configuration**:
```csharp
// SDK-based transport
builder.Services.AddMcpServer(options =>
{
// stdio transport (for CLI tools like Claude Desktop)
options.UseStdioTransport();
// HTTP/SSE transport (for web-based clients)
options.UseHttpTransport(http =>
{
http.BasePath = "/mcp";
http.EnableSse = true; // Server-Sent Events
});
// Custom authentication (preserve API Key auth)
options.AddAuthentication<ApiKeyAuthHandler>();
// Custom authorization (preserve field-level permissions)
options.AddAuthorization<FieldLevelAuthHandler>();
});
```
**Deliverables**:
- Custom middleware removed
- SDK transport configured (stdio + HTTP/SSE)
- API Key authentication migrated to SDK pipeline
- Field-level permissions preserved
**Acceptance Criteria**:
- [ ] stdio transport works (Claude Desktop compatibility)
- [ ] HTTP/SSE transport works (web client compatibility)
- [ ] API Key authentication functional
- [ ] Field-level permissions enforced
- [ ] Zero breaking changes for existing clients
---
### Phase 5: Testing & Optimization (Week 7-8) - Story 17
**Goal**: Comprehensive testing, performance tuning, and documentation
#### Week 7: Integration Testing & Bug Fixes
**Tasks**:
1. **End-to-End Testing**
- Claude Desktop integration test (stdio)
- Web client integration test (HTTP/SSE)
- Multi-tenant isolation verification
- Diff Preview workflow validation
2. **Performance Testing**
- Benchmark Tools (target: 20%+ improvement)
- Benchmark Resources (target: 30%+ improvement)
- Concurrent request testing (100 req/s)
- Memory usage profiling
3. **Security Audit**
- API Key brute force test
- Cross-tenant access attempts
- Field-level permission bypass tests
- SQL injection attempts
4. **Bug Fixes**
- Fix integration test failures
- Address performance bottlenecks
- Fix security vulnerabilities (if found)
#### Week 8: Documentation & Production Readiness
**Tasks**:
1. **Architecture Documentation**
- Update `mcp-server-architecture.md` with SDK details
- Create SDK migration guide for developers
- Document hybrid architecture decisions
- Add troubleshooting guide
2. **API Documentation**
- Update OpenAPI/Swagger specs
- Document Tool parameter schemas
- Document Resource URI patterns
- Add example requests/responses
3. **Code Cleanup**
- Remove old custom protocol code
- Delete obsolete interfaces (IMcpTool, IMcpResource)
- Clean up unused NuGet packages
- Update code comments
4. **Production Readiness**
- Deploy to staging environment
- Smoke testing with real AI clients
- Performance validation
- Final code review
**Deliverables**:
- Comprehensive test suite (>80% coverage)
- Performance report (vs. baseline)
- Security audit report (zero CRITICAL issues)
- Updated architecture documentation
- Production deployment guide
**Acceptance Criteria**:
- [ ] Integration tests pass (>80% coverage)
- [ ] Performance improved by ≥20%
- [ ] Security audit clean (0 CRITICAL, 0 HIGH)
- [ ] Documentation complete and reviewed
- [ ] Production-ready checklist signed off
---
## Stories Breakdown
This Epic is broken down into 5 child Stories:
- [ ] [Story 13](sprint_5_story_13.md) - MCP SDK Foundation & PoC (Week 1-2) - `not_started`
- [ ] [Story 14](sprint_5_story_14.md) - Tool Migration to SDK (Week 3-4) - `not_started`
- [ ] [Story 15](sprint_5_story_15.md) - Resource Migration to SDK (Week 5) - `not_started`
- [ ] [Story 16](sprint_5_story_16.md) - Transport Layer Migration (Week 6) - `not_started`
- [ ] [Story 17](sprint_5_story_17.md) - Testing & Optimization (Week 7-8) - `not_started`
**Progress**: 0/5 stories completed (0%)
## Risk Assessment
### High-Priority Risks
| Risk ID | Description | Impact | Probability | Mitigation |
|---------|-------------|--------|-------------|------------|
| RISK-001 | SDK breaking changes during migration | High | Low | Lock SDK version, gradual migration |
| RISK-002 | Performance regression | High | Medium | Continuous benchmarking, rollback plan |
| RISK-003 | DiffPreview integration conflicts | Medium | Medium | Thorough testing, preserve interfaces |
| RISK-004 | Client compatibility issues | High | Low | Test with Claude Desktop early |
| RISK-005 | Multi-tenant isolation bugs | Critical | Very Low | 100% test coverage, security audit |
### Mitigation Strategies
1. **Phased Migration**: 5 phases allow early detection of issues
2. **Parallel Systems**: Keep old code until SDK fully validated
3. **Feature Flags**: Enable/disable SDK via configuration
4. **Rollback Plan**: Can revert to custom implementation if needed
5. **Continuous Testing**: Run tests after each phase
## Dependencies
### Prerequisites
- ✅ Sprint 5 Phase 1-3 infrastructure (Stories 1-12)
- ✅ Custom MCP implementation complete and working
- ✅ DiffPreview service production-ready
- ✅ Multi-tenant security verified
### External Dependencies
- Microsoft .NET MCP SDK v1.0+ (NuGet)
- Claude Desktop 1.0+ (for testing)
- Continue VS Code Extension (for testing)
### Technical Requirements
- .NET 9+ (already installed)
- PostgreSQL 15+ (already configured)
- Redis 7+ (already configured)
## Acceptance Criteria (Epic-Level)
### Functional Requirements
- [ ] All 10 Tools migrated to SDK `[McpTool]` attributes
- [ ] All 11 Resources migrated to SDK `[McpResource]` attributes
- [ ] stdio transport works (Claude Desktop compatible)
- [ ] HTTP/SSE transport works (web client compatible)
- [ ] Diff Preview workflow preserved (no breaking changes)
- [ ] Multi-tenant isolation 100% verified
- [ ] API Key authentication functional
- [ ] Field-level permissions enforced
### Performance Requirements
- [ ] Response time improved by ≥20%
- [ ] Tool execution time < 500ms (P95)
- [ ] Resource query time < 200ms (P95)
- [ ] Throughput 100 requests/second
- [ ] Memory usage optimized (no leaks)
### Quality Requirements
- [ ] Test coverage 80%
- [ ] Zero CRITICAL security vulnerabilities
- [ ] Zero HIGH security vulnerabilities
- [ ] Code duplication <5%
- [ ] All integration tests pass
### Documentation Requirements
- [ ] Architecture documentation updated
- [ ] API documentation complete
- [ ] Migration guide published
- [ ] Troubleshooting guide published
- [ ] Code examples updated
## Success Metrics
### Code Quality
- **Lines Removed**: 500-700 lines of custom protocol code
- **Code Duplication**: <5%
- **Test Coverage**: 80%
- **Security Score**: 0 CRITICAL, 0 HIGH vulnerabilities
### Performance
- **Response Time**: 20-40% improvement
- **Throughput**: 100+ req/s (from 70 req/s)
- **Memory Usage**: 10-20% reduction
- **Cache Hit Rate**: >80% maintained
### Developer Experience
- **Onboarding Time**: 50% faster (simpler SDK APIs)
- **Code Readability**: +30% (attributes vs. manual registration)
- **Maintenance Effort**: -60% (Microsoft maintains protocol)
## Related Documents
### Research & Design
- [MCP SDK Integration Research](../research/mcp-sdk-integration-research.md)
- [MCP Server Architecture](../architecture/mcp-server-architecture.md)
- [Hybrid Architecture ADR](../architecture/adr/mcp-sdk-hybrid-approach.md)
### Sprint Planning
- [Sprint 5 Plan](sprint_5.md)
- [Product Roadmap](../../product.md) - M2 section
### Technical References
- [Microsoft .NET MCP SDK](https://github.com/microsoft/mcp-dotnet)
- [MCP Specification](https://spec.modelcontextprotocol.io/)
- [ColaFlow MCP Module](../../colaflow-api/src/ColaFlow.Modules.Mcp/)
---
## Notes
### Why Hybrid Architecture?
**Question**: Why not use 100% SDK?
**Answer**: ColaFlow has unique business requirements:
1. **Diff Preview**: SDK doesn't provide preview mechanism (ColaFlow custom)
2. **Approval Workflow**: SDK doesn't have human-in-the-loop (ColaFlow custom)
3. **Multi-Tenant**: SDK doesn't enforce tenant isolation (ColaFlow custom)
4. **Field Permissions**: SDK doesn't have field-level security (ColaFlow custom)
Hybrid approach gets **best of both worlds**:
- SDK handles boring protocol stuff (60-70% code reduction)
- ColaFlow handles business-critical stuff (security, approval)
### What Gets Deleted?
**Custom Code to Remove** (~700 lines):
- `McpProtocolHandler.cs` (JSON-RPC parsing)
- `McpProtocolMiddleware.cs` (HTTP middleware)
- `IMcpTool.cs` interface (replaced by SDK attributes)
- `IMcpResource.cs` interface (replaced by SDK attributes)
- `McpRegistry.cs` (replaced by SDK discovery)
- `McpRequest.cs` / `McpResponse.cs` DTOs (SDK provides)
**Custom Code to Keep** (~1200 lines):
- `DiffPreviewService.cs` (business logic)
- `PendingChangeService.cs` (approval workflow)
- `ApiKeyAuthHandler.cs` (security)
- `FieldLevelAuthHandler.cs` (permissions)
- `TenantContextService.cs` (multi-tenant)
### Timeline Justification
**Why 8 weeks?**
- **Week 1-2**: PoC + training (can't rush, need to understand SDK)
- **Week 3-4**: 10 Tools migration (careful testing required)
- **Week 5**: 11 Resources migration (simpler than Tools)
- **Week 6**: Transport layer (critical, can't break clients)
- **Week 7-8**: Testing + docs (quality gate, can't skip)
**Could it be faster?**
- Yes, if we skip testing (NOT RECOMMENDED)
- Yes, if we accept higher risk (NOT RECOMMENDED)
- This is already aggressive timeline (1.6 weeks per phase)
### Post-Migration Benefits
**Developer Velocity**:
- New Tool creation: 30 min (was 2 hours)
- New Resource creation: 15 min (was 1 hour)
- Onboarding new developers: 2 days (was 5 days)
**Maintenance Burden**:
- Protocol updates: 0 hours (Microsoft handles)
- Bug fixes: -60% effort (less custom code)
- Feature additions: +40% faster (SDK simplifies)
---
**Created**: 2025-11-09 by Product Manager Agent
**Epic Owner**: Backend Team Lead
**Estimated Start**: 2025-11-27 (After Sprint 5 Phase 1-3)
**Estimated Completion**: 2026-01-22 (Week 8 of Sprint 5)
**Status**: Not Started (planning complete)