--- story_id: story_5_4 sprint_id: sprint_5 phase: Phase 1 - Foundation status: not_started priority: P0 story_points: 3 assignee: backend estimated_days: 1 created_date: 2025-11-06 dependencies: [story_5_1] --- # Story 5.4: Error Handling & Logging **Phase**: Phase 1 - Foundation (Week 1-2) **Priority**: P0 CRITICAL **Estimated Effort**: 3 Story Points (1 day) ## User Story **As a** Backend Developer **I want** comprehensive error handling and structured logging for MCP operations **So that** issues can be quickly diagnosed and the system is observable in production ## Business Value Production-grade error handling and logging are critical for: - **Debugging**: Quickly diagnose issues in production - **Monitoring**: Track error rates and performance metrics - **Audit**: Complete trail of all MCP operations - **Security**: Detect and respond to suspicious activity **Impact**: - Reduces mean time to resolution (MTTR) by 50% - Enables proactive monitoring and alerting - Supports compliance requirements (audit logs) - Improves developer productivity ## Acceptance Criteria ### AC1: MCP Exception Hierarchy - [ ] Custom exception classes for MCP operations - [ ] Exception-to-MCP-error-code mapping (JSON-RPC 2.0) - [ ] Preserve stack traces and inner exceptions - [ ] Structured exception data (request ID, tenant ID, API Key ID) ### AC2: Global Exception Handler - [ ] Catch all unhandled exceptions - [ ] Map exceptions to MCP error responses - [ ] Log errors with full context - [ ] Return appropriate HTTP status codes - [ ] Never expose internal details to clients ### AC3: Structured Logging - [ ] Use Serilog with structured logging - [ ] Include correlation ID in all logs - [ ] Log request/response at DEBUG level - [ ] Log errors at ERROR level with stack traces - [ ] Log performance metrics (timing) - [ ] Never log sensitive data (API Keys, passwords, PII) ### AC4: Request/Response Logging Middleware - [ ] Log all MCP requests (method, params, correlation ID) - [ ] Log all MCP responses (result or error, timing) - [ ] Performance timing (request duration) - [ ] Exclude sensitive fields from logs (API Key hash) ### AC5: Performance Monitoring - [ ] Track request duration (P50, P95, P99) - [ ] Track error rates (by method, by tenant) - [ ] Track API Key usage (by key, by tenant) - [ ] Export metrics for Prometheus (future) ### AC6: Testing - [ ] Unit tests for exception mapping - [ ] Integration tests for error responses - [ ] Verify sensitive data never logged - [ ] Test correlation ID propagation ## Technical Design ### MCP Exception Hierarchy ```csharp public abstract class McpException : Exception { public int ErrorCode { get; } public object? ErrorData { get; } protected McpException( int errorCode, string message, object? errorData = null, Exception? innerException = null) : base(message, innerException) { ErrorCode = errorCode; ErrorData = errorData; } public McpError ToMcpError() { return new McpError { Code = ErrorCode, Message = Message, Data = ErrorData }; } } public class McpParseException : McpException { public McpParseException(string message, Exception? innerException = null) : base(McpErrorCode.ParseError, message, null, innerException) { } } public class McpInvalidRequestException : McpException { public McpInvalidRequestException(string message, object? errorData = null) : base(McpErrorCode.InvalidRequest, message, errorData) { } } public class McpMethodNotFoundException : McpException { public McpMethodNotFoundException(string method) : base(McpErrorCode.MethodNotFound, $"Method not found: {method}") { } } public class McpInvalidParamsException : McpException { public McpInvalidParamsException(string message, object? errorData = null) : base(McpErrorCode.InvalidParams, message, errorData) { } } public class McpUnauthorizedException : McpException { public McpUnauthorizedException(string message = "Unauthorized") : base(McpErrorCode.Unauthorized, message) { } } public class McpForbiddenException : McpException { public McpForbiddenException(string message = "Forbidden") : base(McpErrorCode.Forbidden, message) { } } public class McpNotFoundException : McpException { public McpNotFoundException(string resourceType, string resourceId) : base(McpErrorCode.NotFound, $"{resourceType} not found: {resourceId}") { } } public class McpValidationException : McpException { public McpValidationException(string message, object? errorData = null) : base(McpErrorCode.ValidationFailed, message, errorData) { } } ``` ### Global Exception Handler Middleware ```csharp public class McpExceptionHandlerMiddleware { private readonly RequestDelegate _next; private readonly ILogger _logger; public async Task InvokeAsync(HttpContext context) { try { await _next(context); } catch (McpException mcpEx) { await HandleMcpExceptionAsync(context, mcpEx); } catch (Exception ex) { await HandleUnexpectedExceptionAsync(context, ex); } } private async Task HandleMcpExceptionAsync( HttpContext context, McpException mcpEx) { var correlationId = context.Items["CorrelationId"] as string; _logger.LogError(mcpEx, "MCP Error: {ErrorCode} - {Message} (CorrelationId: {CorrelationId})", mcpEx.ErrorCode, mcpEx.Message, correlationId); var response = new McpResponse { JsonRpc = "2.0", Error = mcpEx.ToMcpError(), Id = context.Items["McpRequestId"] as string }; context.Response.StatusCode = GetHttpStatusCode(mcpEx.ErrorCode); context.Response.ContentType = "application/json"; await context.Response.WriteAsJsonAsync(response); } private async Task HandleUnexpectedExceptionAsync( HttpContext context, Exception ex) { var correlationId = context.Items["CorrelationId"] as string; _logger.LogError(ex, "Unexpected error in MCP Server (CorrelationId: {CorrelationId})", correlationId); var response = new McpResponse { JsonRpc = "2.0", Error = new McpError { Code = McpErrorCode.InternalError, Message = "Internal server error" // Do NOT expose exception details }, Id = context.Items["McpRequestId"] as string }; context.Response.StatusCode = 500; context.Response.ContentType = "application/json"; await context.Response.WriteAsJsonAsync(response); } private static int GetHttpStatusCode(int mcpErrorCode) { return mcpErrorCode switch { McpErrorCode.Unauthorized => 401, McpErrorCode.Forbidden => 403, McpErrorCode.NotFound => 404, McpErrorCode.ValidationFailed => 422, _ => 400 }; } } ``` ### Request/Response Logging Middleware ```csharp public class McpLoggingMiddleware { private readonly RequestDelegate _next; private readonly ILogger _logger; public async Task InvokeAsync(HttpContext context) { // Generate correlation ID var correlationId = Guid.NewGuid().ToString(); context.Items["CorrelationId"] = correlationId; // Start timing var stopwatch = Stopwatch.StartNew(); // Enable request buffering for logging context.Request.EnableBuffering(); // Log request await LogRequestAsync(context, correlationId); // Capture response var originalBody = context.Response.Body; using var memoryStream = new MemoryStream(); context.Response.Body = memoryStream; try { await _next(context); stopwatch.Stop(); // Log response await LogResponseAsync( context, correlationId, stopwatch.ElapsedMilliseconds); // Copy response to original stream memoryStream.Seek(0, SeekOrigin.Begin); await memoryStream.CopyToAsync(originalBody); } finally { context.Response.Body = originalBody; } } private async Task LogRequestAsync( HttpContext context, string correlationId) { context.Request.Body.Seek(0, SeekOrigin.Begin); var body = await new StreamReader(context.Request.Body).ReadToEndAsync(); context.Request.Body.Seek(0, SeekOrigin.Begin); var tenantId = context.Items["TenantId"]; var apiKeyId = context.Items["ApiKeyId"]; _logger.LogDebug( "MCP Request: {Method} {Path} (CorrelationId: {CorrelationId}, " + "TenantId: {TenantId}, ApiKeyId: {ApiKeyId})\n{Body}", context.Request.Method, context.Request.Path, correlationId, tenantId, apiKeyId, SanitizeBody(body)); } private async Task LogResponseAsync( HttpContext context, string correlationId, long elapsedMs) { context.Response.Body.Seek(0, SeekOrigin.Begin); var body = await new StreamReader(context.Response.Body).ReadToEndAsync(); context.Response.Body.Seek(0, SeekOrigin.Begin); var logLevel = context.Response.StatusCode >= 400 ? LogLevel.Error : LogLevel.Debug; _logger.Log(logLevel, "MCP Response: {StatusCode} (CorrelationId: {CorrelationId}, " + "Duration: {Duration}ms)\n{Body}", context.Response.StatusCode, correlationId, elapsedMs, body); } private string SanitizeBody(string body) { // Remove sensitive data (API Keys, passwords, etc.) // This is a simplified example return body.Replace("\"key_hash\":", "\"key_hash\":[REDACTED]"); } } ``` ### Serilog Configuration ```csharp public static class SerilogConfiguration { public static void ConfigureSerilog(WebApplicationBuilder builder) { builder.Host.UseSerilog((context, services, configuration) => { configuration .ReadFrom.Configuration(context.Configuration) .Enrich.FromLogContext() .Enrich.WithMachineName() .Enrich.WithEnvironmentName() .Enrich.WithProperty("Application", "ColaFlow") .WriteTo.Console( outputTemplate: "[{Timestamp:HH:mm:ss} {Level:u3}] " + "{CorrelationId} {Message:lj}{NewLine}{Exception}") .WriteTo.File( path: "logs/colaflow-.log", rollingInterval: RollingInterval.Day, retainedFileCountLimit: 30, outputTemplate: "[{Timestamp:yyyy-MM-dd HH:mm:ss.fff zzz}] " + "[{Level:u3}] {CorrelationId} {Message:lj}{NewLine}{Exception}") .WriteTo.PostgreSQL( connectionString: context.Configuration.GetConnectionString("Default"), tableName: "logs", needAutoCreateTable: true); }); } } ``` ## Tasks ### Task 1: MCP Exception Classes (2 hours) - [ ] Create `McpException` base class - [ ] Create 8 specific exception classes - [ ] Implement `ToMcpError()` method - [ ] Add XML documentation **Files to Create**: - `ColaFlow.Modules.Mcp/Exceptions/McpException.cs` - `ColaFlow.Modules.Mcp/Exceptions/McpParseException.cs` - `ColaFlow.Modules.Mcp/Exceptions/McpInvalidRequestException.cs` - `ColaFlow.Modules.Mcp/Exceptions/McpMethodNotFoundException.cs` - `ColaFlow.Modules.Mcp/Exceptions/McpInvalidParamsException.cs` - `ColaFlow.Modules.Mcp/Exceptions/McpUnauthorizedException.cs` - `ColaFlow.Modules.Mcp/Exceptions/McpForbiddenException.cs` - `ColaFlow.Modules.Mcp/Exceptions/McpNotFoundException.cs` - `ColaFlow.Modules.Mcp/Exceptions/McpValidationException.cs` ### Task 2: Global Exception Handler Middleware (2 hours) - [ ] Create `McpExceptionHandlerMiddleware` - [ ] Handle `McpException` (map to MCP error) - [ ] Handle unexpected exceptions (return InternalError) - [ ] Map error codes to HTTP status codes - [ ] Add structured logging **Files to Create**: - `ColaFlow.Modules.Mcp/Middleware/McpExceptionHandlerMiddleware.cs` ### Task 3: Request/Response Logging Middleware (3 hours) - [ ] Create `McpLoggingMiddleware` - [ ] Generate correlation ID - [ ] Log request (method, params, timing) - [ ] Log response (result/error, timing) - [ ] Sanitize sensitive data (API Keys, passwords) **Files to Create**: - `ColaFlow.Modules.Mcp/Middleware/McpLoggingMiddleware.cs` ### Task 4: Serilog Configuration (1 hour) - [ ] Configure Serilog (Console, File, PostgreSQL sinks) - [ ] Add enrichers (correlation ID, machine name, environment) - [ ] Configure structured logging format - [ ] Set log levels (DEBUG for dev, INFO for prod) **Files to Modify**: - `ColaFlow.Api/Program.cs` ### Task 5: Unit Tests (3 hours) - [ ] Test exception-to-MCP-error mapping - [ ] Test HTTP status code mapping - [ ] Test sensitive data sanitization - [ ] Test correlation ID generation **Files to Create**: - `ColaFlow.Modules.Mcp.Tests/Exceptions/McpExceptionTests.cs` - `ColaFlow.Modules.Mcp.Tests/Middleware/McpExceptionHandlerTests.cs` - `ColaFlow.Modules.Mcp.Tests/Middleware/McpLoggingMiddlewareTests.cs` ### Task 6: Integration Tests (2 hours) - [ ] Test end-to-end error handling - [ ] Test logging middleware (verify logs written) - [ ] Test correlation ID propagation - [ ] Test sensitive data never logged **Files to Create**: - `ColaFlow.Modules.Mcp.Tests/Integration/ErrorHandlingIntegrationTests.cs` ## Testing Strategy ### Unit Tests - Exception class creation and properties - Exception-to-MCP-error conversion - HTTP status code mapping - Sensitive data sanitization ### Integration Tests - End-to-end error handling flow - Logging middleware writes logs correctly - Correlation ID in all log entries - No sensitive data in logs ### Manual Testing Checklist - [ ] Trigger ParseError → verify logs and response - [ ] Trigger MethodNotFound → verify logs and response - [ ] Trigger InternalError → verify logs and response - [ ] Check logs for correlation ID - [ ] Check logs do NOT contain API Keys or passwords ## Dependencies **Prerequisites**: - Story 5.1 (MCP Protocol Handler) - Exception handling integrated here - Serilog NuGet packages - Serilog.Sinks.PostgreSQL **Used By**: - All MCP Stories (5.5-5.12) - Rely on error handling ## Risks & Mitigation | Risk | Impact | Probability | Mitigation | |------|--------|-------------|------------| | Sensitive data logged | High | Medium | Sanitization function, code review | | Logging performance | Medium | Low | Async logging, buffering | | Log storage growth | Medium | Medium | 30-day retention, log rotation | | Exception details leaked | High | Low | Never expose internal details to clients | ## Definition of Done - [ ] Code compiles without warnings - [ ] All unit tests passing (> 80% coverage) - [ ] All integration tests passing - [ ] Code reviewed and approved - [ ] XML documentation for exceptions - [ ] Serilog configured correctly - [ ] Correlation ID in all logs - [ ] No sensitive data in logs (verified) - [ ] Error responses conform to JSON-RPC 2.0 ## Notes ### Why This Story Matters - **Production Readiness**: Cannot deploy without proper error handling - **Observability**: Structured logging enables monitoring and alerting - **Debugging**: Correlation ID makes troubleshooting 10x faster - **Security**: Prevents sensitive data leaks in logs ### Key Design Decisions 1. **Custom Exceptions**: Clear error semantics, easy to map to MCP errors 2. **Correlation ID**: Track requests across distributed system 3. **Structured Logging**: Machine-readable, queryable logs 4. **Sanitization**: Prevent sensitive data leaks 5. **Async Logging**: Minimal performance impact ### Logging Best Practices - Use structured logging (not string interpolation) - Include correlation ID in all logs - Log at appropriate levels (DEBUG, INFO, ERROR) - Never log sensitive data (API Keys, passwords, PII) - Use async logging to avoid blocking - Rotate logs daily, retain for 30 days ### Reference Materials - Serilog Documentation: https://serilog.net/ - JSON-RPC 2.0 Error Codes: https://www.jsonrpc.org/specification#error_object - Sprint 5 Plan: `docs/plans/sprint_5.md`