Files
ColaFlow/docs/stories/sprint_5/story_5_4.md
Yaojia Wang 48a8431e4f feat(backend): Implement MCP Protocol Handler (Story 5.1)
Implemented JSON-RPC 2.0 protocol handler for MCP communication, enabling AI agents to communicate with ColaFlow using the Model Context Protocol.

**Implementation:**
- JSON-RPC 2.0 data models (Request, Response, Error, ErrorCode)
- MCP protocol models (Initialize, Capabilities, ClientInfo, ServerInfo)
- McpProtocolHandler with method routing and error handling
- Method handlers: initialize, resources/list, tools/list, tools/call
- ASP.NET Core middleware for /mcp endpoint
- Service registration and dependency injection setup

**Testing:**
- 28 unit tests covering protocol parsing, validation, and error handling
- Integration tests for initialize handshake and error responses
- All tests passing with >80% coverage

**Changes:**
- Created ColaFlow.Modules.Mcp.Contracts project
- Created ColaFlow.Modules.Mcp.Domain project
- Created ColaFlow.Modules.Mcp.Application project
- Created ColaFlow.Modules.Mcp.Infrastructure project
- Created ColaFlow.Modules.Mcp.Tests project
- Registered MCP module in ColaFlow.API Program.cs
- Added /mcp endpoint via middleware

**Acceptance Criteria Met:**
 JSON-RPC 2.0 messages correctly parsed
 Request validation (jsonrpc: "2.0", method, params, id)
 Error responses conform to JSON-RPC 2.0 spec
 Invalid requests return proper error codes (-32700, -32600, -32601, -32602)
 MCP initialize method implemented
 Server capabilities returned (resources, tools, prompts)
 Protocol version negotiation works (1.0)
 Request routing to method handlers
 Unit test coverage > 80%
 All tests passing

**Story**: docs/stories/sprint_5/story_5_1.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-07 19:38:34 +01:00

16 KiB

story_id, sprint_id, phase, status, priority, story_points, assignee, estimated_days, created_date, dependencies
story_id sprint_id phase status priority story_points assignee estimated_days created_date dependencies
story_5_4 sprint_5 Phase 1 - Foundation not_started P0 3 backend 1 2025-11-06
story_5_1

Story 5.4: Error Handling & Logging

Phase: Phase 1 - Foundation (Week 1-2) Priority: P0 CRITICAL Estimated Effort: 3 Story Points (1 day)

User Story

As a Backend Developer I want comprehensive error handling and structured logging for MCP operations So that issues can be quickly diagnosed and the system is observable in production

Business Value

Production-grade error handling and logging are critical for:

  • Debugging: Quickly diagnose issues in production
  • Monitoring: Track error rates and performance metrics
  • Audit: Complete trail of all MCP operations
  • Security: Detect and respond to suspicious activity

Impact:

  • Reduces mean time to resolution (MTTR) by 50%
  • Enables proactive monitoring and alerting
  • Supports compliance requirements (audit logs)
  • Improves developer productivity

Acceptance Criteria

AC1: MCP Exception Hierarchy

  • Custom exception classes for MCP operations
  • Exception-to-MCP-error-code mapping (JSON-RPC 2.0)
  • Preserve stack traces and inner exceptions
  • Structured exception data (request ID, tenant ID, API Key ID)

AC2: Global Exception Handler

  • Catch all unhandled exceptions
  • Map exceptions to MCP error responses
  • Log errors with full context
  • Return appropriate HTTP status codes
  • Never expose internal details to clients

AC3: Structured Logging

  • Use Serilog with structured logging
  • Include correlation ID in all logs
  • Log request/response at DEBUG level
  • Log errors at ERROR level with stack traces
  • Log performance metrics (timing)
  • Never log sensitive data (API Keys, passwords, PII)

AC4: Request/Response Logging Middleware

  • Log all MCP requests (method, params, correlation ID)
  • Log all MCP responses (result or error, timing)
  • Performance timing (request duration)
  • Exclude sensitive fields from logs (API Key hash)

AC5: Performance Monitoring

  • Track request duration (P50, P95, P99)
  • Track error rates (by method, by tenant)
  • Track API Key usage (by key, by tenant)
  • Export metrics for Prometheus (future)

AC6: Testing

  • Unit tests for exception mapping
  • Integration tests for error responses
  • Verify sensitive data never logged
  • Test correlation ID propagation

Technical Design

MCP Exception Hierarchy

public abstract class McpException : Exception
{
    public int ErrorCode { get; }
    public object? ErrorData { get; }

    protected McpException(
        int errorCode,
        string message,
        object? errorData = null,
        Exception? innerException = null)
        : base(message, innerException)
    {
        ErrorCode = errorCode;
        ErrorData = errorData;
    }

    public McpError ToMcpError()
    {
        return new McpError
        {
            Code = ErrorCode,
            Message = Message,
            Data = ErrorData
        };
    }
}

public class McpParseException : McpException
{
    public McpParseException(string message, Exception? innerException = null)
        : base(McpErrorCode.ParseError, message, null, innerException) { }
}

public class McpInvalidRequestException : McpException
{
    public McpInvalidRequestException(string message, object? errorData = null)
        : base(McpErrorCode.InvalidRequest, message, errorData) { }
}

public class McpMethodNotFoundException : McpException
{
    public McpMethodNotFoundException(string method)
        : base(McpErrorCode.MethodNotFound, $"Method not found: {method}") { }
}

public class McpInvalidParamsException : McpException
{
    public McpInvalidParamsException(string message, object? errorData = null)
        : base(McpErrorCode.InvalidParams, message, errorData) { }
}

public class McpUnauthorizedException : McpException
{
    public McpUnauthorizedException(string message = "Unauthorized")
        : base(McpErrorCode.Unauthorized, message) { }
}

public class McpForbiddenException : McpException
{
    public McpForbiddenException(string message = "Forbidden")
        : base(McpErrorCode.Forbidden, message) { }
}

public class McpNotFoundException : McpException
{
    public McpNotFoundException(string resourceType, string resourceId)
        : base(McpErrorCode.NotFound,
            $"{resourceType} not found: {resourceId}") { }
}

public class McpValidationException : McpException
{
    public McpValidationException(string message, object? errorData = null)
        : base(McpErrorCode.ValidationFailed, message, errorData) { }
}

Global Exception Handler Middleware

public class McpExceptionHandlerMiddleware
{
    private readonly RequestDelegate _next;
    private readonly ILogger<McpExceptionHandlerMiddleware> _logger;

    public async Task InvokeAsync(HttpContext context)
    {
        try
        {
            await _next(context);
        }
        catch (McpException mcpEx)
        {
            await HandleMcpExceptionAsync(context, mcpEx);
        }
        catch (Exception ex)
        {
            await HandleUnexpectedExceptionAsync(context, ex);
        }
    }

    private async Task HandleMcpExceptionAsync(
        HttpContext context,
        McpException mcpEx)
    {
        var correlationId = context.Items["CorrelationId"] as string;

        _logger.LogError(mcpEx,
            "MCP Error: {ErrorCode} - {Message} (CorrelationId: {CorrelationId})",
            mcpEx.ErrorCode, mcpEx.Message, correlationId);

        var response = new McpResponse
        {
            JsonRpc = "2.0",
            Error = mcpEx.ToMcpError(),
            Id = context.Items["McpRequestId"] as string
        };

        context.Response.StatusCode = GetHttpStatusCode(mcpEx.ErrorCode);
        context.Response.ContentType = "application/json";
        await context.Response.WriteAsJsonAsync(response);
    }

    private async Task HandleUnexpectedExceptionAsync(
        HttpContext context,
        Exception ex)
    {
        var correlationId = context.Items["CorrelationId"] as string;

        _logger.LogError(ex,
            "Unexpected error in MCP Server (CorrelationId: {CorrelationId})",
            correlationId);

        var response = new McpResponse
        {
            JsonRpc = "2.0",
            Error = new McpError
            {
                Code = McpErrorCode.InternalError,
                Message = "Internal server error"
                // Do NOT expose exception details
            },
            Id = context.Items["McpRequestId"] as string
        };

        context.Response.StatusCode = 500;
        context.Response.ContentType = "application/json";
        await context.Response.WriteAsJsonAsync(response);
    }

    private static int GetHttpStatusCode(int mcpErrorCode)
    {
        return mcpErrorCode switch
        {
            McpErrorCode.Unauthorized => 401,
            McpErrorCode.Forbidden => 403,
            McpErrorCode.NotFound => 404,
            McpErrorCode.ValidationFailed => 422,
            _ => 400
        };
    }
}

Request/Response Logging Middleware

public class McpLoggingMiddleware
{
    private readonly RequestDelegate _next;
    private readonly ILogger<McpLoggingMiddleware> _logger;

    public async Task InvokeAsync(HttpContext context)
    {
        // Generate correlation ID
        var correlationId = Guid.NewGuid().ToString();
        context.Items["CorrelationId"] = correlationId;

        // Start timing
        var stopwatch = Stopwatch.StartNew();

        // Enable request buffering for logging
        context.Request.EnableBuffering();

        // Log request
        await LogRequestAsync(context, correlationId);

        // Capture response
        var originalBody = context.Response.Body;
        using var memoryStream = new MemoryStream();
        context.Response.Body = memoryStream;

        try
        {
            await _next(context);

            stopwatch.Stop();

            // Log response
            await LogResponseAsync(
                context, correlationId, stopwatch.ElapsedMilliseconds);

            // Copy response to original stream
            memoryStream.Seek(0, SeekOrigin.Begin);
            await memoryStream.CopyToAsync(originalBody);
        }
        finally
        {
            context.Response.Body = originalBody;
        }
    }

    private async Task LogRequestAsync(
        HttpContext context,
        string correlationId)
    {
        context.Request.Body.Seek(0, SeekOrigin.Begin);
        var body = await new StreamReader(context.Request.Body).ReadToEndAsync();
        context.Request.Body.Seek(0, SeekOrigin.Begin);

        var tenantId = context.Items["TenantId"];
        var apiKeyId = context.Items["ApiKeyId"];

        _logger.LogDebug(
            "MCP Request: {Method} {Path} (CorrelationId: {CorrelationId}, " +
            "TenantId: {TenantId}, ApiKeyId: {ApiKeyId})\n{Body}",
            context.Request.Method,
            context.Request.Path,
            correlationId,
            tenantId,
            apiKeyId,
            SanitizeBody(body));
    }

    private async Task LogResponseAsync(
        HttpContext context,
        string correlationId,
        long elapsedMs)
    {
        context.Response.Body.Seek(0, SeekOrigin.Begin);
        var body = await new StreamReader(context.Response.Body).ReadToEndAsync();
        context.Response.Body.Seek(0, SeekOrigin.Begin);

        var logLevel = context.Response.StatusCode >= 400
            ? LogLevel.Error
            : LogLevel.Debug;

        _logger.Log(logLevel,
            "MCP Response: {StatusCode} (CorrelationId: {CorrelationId}, " +
            "Duration: {Duration}ms)\n{Body}",
            context.Response.StatusCode,
            correlationId,
            elapsedMs,
            body);
    }

    private string SanitizeBody(string body)
    {
        // Remove sensitive data (API Keys, passwords, etc.)
        // This is a simplified example
        return body.Replace("\"key_hash\":", "\"key_hash\":[REDACTED]");
    }
}

Serilog Configuration

public static class SerilogConfiguration
{
    public static void ConfigureSerilog(WebApplicationBuilder builder)
    {
        builder.Host.UseSerilog((context, services, configuration) =>
        {
            configuration
                .ReadFrom.Configuration(context.Configuration)
                .Enrich.FromLogContext()
                .Enrich.WithMachineName()
                .Enrich.WithEnvironmentName()
                .Enrich.WithProperty("Application", "ColaFlow")
                .WriteTo.Console(
                    outputTemplate: "[{Timestamp:HH:mm:ss} {Level:u3}] " +
                        "{CorrelationId} {Message:lj}{NewLine}{Exception}")
                .WriteTo.File(
                    path: "logs/colaflow-.log",
                    rollingInterval: RollingInterval.Day,
                    retainedFileCountLimit: 30,
                    outputTemplate: "[{Timestamp:yyyy-MM-dd HH:mm:ss.fff zzz}] " +
                        "[{Level:u3}] {CorrelationId} {Message:lj}{NewLine}{Exception}")
                .WriteTo.PostgreSQL(
                    connectionString: context.Configuration.GetConnectionString("Default"),
                    tableName: "logs",
                    needAutoCreateTable: true);
        });
    }
}

Tasks

Task 1: MCP Exception Classes (2 hours)

  • Create McpException base class
  • Create 8 specific exception classes
  • Implement ToMcpError() method
  • Add XML documentation

Files to Create:

  • ColaFlow.Modules.Mcp/Exceptions/McpException.cs
  • ColaFlow.Modules.Mcp/Exceptions/McpParseException.cs
  • ColaFlow.Modules.Mcp/Exceptions/McpInvalidRequestException.cs
  • ColaFlow.Modules.Mcp/Exceptions/McpMethodNotFoundException.cs
  • ColaFlow.Modules.Mcp/Exceptions/McpInvalidParamsException.cs
  • ColaFlow.Modules.Mcp/Exceptions/McpUnauthorizedException.cs
  • ColaFlow.Modules.Mcp/Exceptions/McpForbiddenException.cs
  • ColaFlow.Modules.Mcp/Exceptions/McpNotFoundException.cs
  • ColaFlow.Modules.Mcp/Exceptions/McpValidationException.cs

Task 2: Global Exception Handler Middleware (2 hours)

  • Create McpExceptionHandlerMiddleware
  • Handle McpException (map to MCP error)
  • Handle unexpected exceptions (return InternalError)
  • Map error codes to HTTP status codes
  • Add structured logging

Files to Create:

  • ColaFlow.Modules.Mcp/Middleware/McpExceptionHandlerMiddleware.cs

Task 3: Request/Response Logging Middleware (3 hours)

  • Create McpLoggingMiddleware
  • Generate correlation ID
  • Log request (method, params, timing)
  • Log response (result/error, timing)
  • Sanitize sensitive data (API Keys, passwords)

Files to Create:

  • ColaFlow.Modules.Mcp/Middleware/McpLoggingMiddleware.cs

Task 4: Serilog Configuration (1 hour)

  • Configure Serilog (Console, File, PostgreSQL sinks)
  • Add enrichers (correlation ID, machine name, environment)
  • Configure structured logging format
  • Set log levels (DEBUG for dev, INFO for prod)

Files to Modify:

  • ColaFlow.Api/Program.cs

Task 5: Unit Tests (3 hours)

  • Test exception-to-MCP-error mapping
  • Test HTTP status code mapping
  • Test sensitive data sanitization
  • Test correlation ID generation

Files to Create:

  • ColaFlow.Modules.Mcp.Tests/Exceptions/McpExceptionTests.cs
  • ColaFlow.Modules.Mcp.Tests/Middleware/McpExceptionHandlerTests.cs
  • ColaFlow.Modules.Mcp.Tests/Middleware/McpLoggingMiddlewareTests.cs

Task 6: Integration Tests (2 hours)

  • Test end-to-end error handling
  • Test logging middleware (verify logs written)
  • Test correlation ID propagation
  • Test sensitive data never logged

Files to Create:

  • ColaFlow.Modules.Mcp.Tests/Integration/ErrorHandlingIntegrationTests.cs

Testing Strategy

Unit Tests

  • Exception class creation and properties
  • Exception-to-MCP-error conversion
  • HTTP status code mapping
  • Sensitive data sanitization

Integration Tests

  • End-to-end error handling flow
  • Logging middleware writes logs correctly
  • Correlation ID in all log entries
  • No sensitive data in logs

Manual Testing Checklist

  • Trigger ParseError → verify logs and response
  • Trigger MethodNotFound → verify logs and response
  • Trigger InternalError → verify logs and response
  • Check logs for correlation ID
  • Check logs do NOT contain API Keys or passwords

Dependencies

Prerequisites:

  • Story 5.1 (MCP Protocol Handler) - Exception handling integrated here
  • Serilog NuGet packages
  • Serilog.Sinks.PostgreSQL

Used By:

  • All MCP Stories (5.5-5.12) - Rely on error handling

Risks & Mitigation

Risk Impact Probability Mitigation
Sensitive data logged High Medium Sanitization function, code review
Logging performance Medium Low Async logging, buffering
Log storage growth Medium Medium 30-day retention, log rotation
Exception details leaked High Low Never expose internal details to clients

Definition of Done

  • Code compiles without warnings
  • All unit tests passing (> 80% coverage)
  • All integration tests passing
  • Code reviewed and approved
  • XML documentation for exceptions
  • Serilog configured correctly
  • Correlation ID in all logs
  • No sensitive data in logs (verified)
  • Error responses conform to JSON-RPC 2.0

Notes

Why This Story Matters

  • Production Readiness: Cannot deploy without proper error handling
  • Observability: Structured logging enables monitoring and alerting
  • Debugging: Correlation ID makes troubleshooting 10x faster
  • Security: Prevents sensitive data leaks in logs

Key Design Decisions

  1. Custom Exceptions: Clear error semantics, easy to map to MCP errors
  2. Correlation ID: Track requests across distributed system
  3. Structured Logging: Machine-readable, queryable logs
  4. Sanitization: Prevent sensitive data leaks
  5. Async Logging: Minimal performance impact

Logging Best Practices

  • Use structured logging (not string interpolation)
  • Include correlation ID in all logs
  • Log at appropriate levels (DEBUG, INFO, ERROR)
  • Never log sensitive data (API Keys, passwords, PII)
  • Use async logging to avoid blocking
  • Rotate logs daily, retain for 30 days

Reference Materials