Health checks are crucial for maintaining reliable applications in production. They enable container orchestrators, load balancers, and monitoring systems to detect issues early and take corrective action. This comprehensive guide covers everything from basic implementation to advanced production patterns in .NET applications.

What Are Health Checks?

Health checks are lightweight endpoints that report the operational status of your application and its dependencies. They answer the simple question: "Is my application ready to serve requests?" This information is used by:

  • Container orchestrators (Kubernetes, Docker Swarm) for liveness and readiness probes
  • Load balancers to route traffic away from unhealthy instances
  • Monitoring systems to alert on service degradation
  • Deployment tools to validate successful deployments

Basic Health Check Implementation

Getting Started

ASP.NET Core provides built-in health check middleware. Here's the simplest implementation:

var builder = WebApplication.CreateBuilder(args);

// Register health check services
builder.Services.AddHealthChecks();

var app = builder.Build();

// Map health check endpoint
app.MapHealthChecks("/healthz");

app.Run();

This creates a basic health check endpoint at /healthz that returns:

  • 200 OK with "Healthy" if the application can respond
  • 503 Service Unavailable with "Unhealthy" if any checks fail

Adding Database Health Checks

Most applications depend on databases. Here's how to check database connectivity:

// For SQL Server
builder.Services.AddHealthChecks()
    .AddSqlServer(
        connectionString: builder.Configuration.GetConnectionString("DefaultConnection")!,
        name: "database",
        tags: new[] { "db", "sql" });

// For Entity Framework Core
builder.Services.AddHealthChecks()
    .AddDbContextCheck<ApplicationDbContext>(
        name: "database",
        tags: new[] { "db", "ef" });

Custom Health Checks

Creating Custom Health Checks

For business-specific health validation, implement the IHealthCheck interface:

public class ExternalApiHealthCheck : IHealthCheck
{
    private readonly HttpClient _httpClient;
    private readonly ILogger<ExternalApiHealthCheck> _logger;

    public ExternalApiHealthCheck(HttpClient httpClient, ILogger<ExternalApiHealthCheck> logger)
    {
        _httpClient = httpClient;
        _logger = logger;
    }

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        try
        {
            var response = await _httpClient.GetAsync("/api/health", cancellationToken);
            
            if (response.IsSuccessStatusCode)
            {
                return HealthCheckResult.Healthy("External API is responsive", 
                    new Dictionary<string, object>
                    {
                        ["response_time"] = response.Headers.Date?.ToString() ?? "unknown",
                        ["status_code"] = (int)response.StatusCode
                    });
            }

            return HealthCheckResult.Degraded($"External API returned {response.StatusCode}");
        }
        catch (TaskCanceledException)
        {
            return HealthCheckResult.Unhealthy("External API timeout");
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Health check failed for external API");
            return HealthCheckResult.Unhealthy("External API unreachable", ex);
        }
    }
}

Registering Custom Health Checks

builder.Services.AddHttpClient<ExternalApiHealthCheck>(client =>
{
    client.BaseAddress = new Uri("https://api.external-service.com");
    client.Timeout = TimeSpan.FromSeconds(5);
});

builder.Services.AddHealthChecks()
    .AddCheck<ExternalApiHealthCheck>(
        name: "external-api",
        failureStatus: HealthStatus.Degraded,
        tags: new[] { "external", "api" });

Advanced Health Check Patterns

Readiness vs. Liveness Checks

In containerized environments, you often need different types of health checks:

public class StartupHealthCheck : IHealthCheck
{
    private volatile bool _isReady = false;

    public bool IsReady 
    { 
        get => _isReady; 
        set => _isReady = value; 
    }

    public Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        if (_isReady)
        {
            return Task.FromResult(HealthCheckResult.Healthy("Application is ready"));
        }

        return Task.FromResult(HealthCheckResult.Unhealthy("Application is still starting"));
    }
}

public class StartupService : IHostedService
{
    private readonly StartupHealthCheck _healthCheck;

    public StartupService(StartupHealthCheck healthCheck)
    {
        _healthCheck = healthCheck;
    }

    public async Task StartAsync(CancellationToken cancellationToken)
    {
        // Simulate startup work
        await Task.Delay(TimeSpan.FromSeconds(10), cancellationToken);
        _healthCheck.IsReady = true;
    }

    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
}

// Registration
builder.Services.AddSingleton<StartupHealthCheck>();
builder.Services.AddHostedService<StartupService>();

builder.Services.AddHealthChecks()
    .AddCheck<StartupHealthCheck>("startup", tags: new[] { "ready" });

// Configure different endpoints
app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready")
});

app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
    Predicate = _ => false // Always healthy for liveness
});

Custom Response Format

Customize the health check response format for better observability:

public static class HealthCheckExtensions
{
    public static Task WriteHealthCheckResponse(HttpContext context, HealthReport result)
    {
        context.Response.ContentType = "application/json; charset=utf-8";

        var response = new
        {
            status = result.Status.ToString(),
            duration = result.TotalDuration.TotalMilliseconds,
            checks = result.Entries.Select(entry => new
            {
                name = entry.Key,
                status = entry.Value.Status.ToString(),
                duration = entry.Value.Duration.TotalMilliseconds,
                description = entry.Value.Description,
                data = entry.Value.Data,
                exception = entry.Value.Exception?.Message
            }),
            timestamp = DateTime.UtcNow
        };

        return context.Response.WriteAsync(JsonSerializer.Serialize(response, new JsonSerializerOptions
        {
            PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
            WriteIndented = true
        }));
    }
}

// Usage
app.MapHealthChecks("/healthz", new HealthCheckOptions
{
    ResponseWriter = HealthCheckExtensions.WriteHealthCheckResponse
});

Database Health Check Patterns

Efficient Database Checks

Be careful with database health checks to avoid impacting performance:

public class EfficientDatabaseHealthCheck : IHealthCheck
{
    private readonly IDbConnectionFactory _connectionFactory;
    private static readonly SemaphoreSlim _semaphore = new(1, 1);

    public EfficientDatabaseHealthCheck(IDbConnectionFactory connectionFactory)
    {
        _connectionFactory = connectionFactory;
    }

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        // Prevent concurrent database checks
        if (!await _semaphore.WaitAsync(100, cancellationToken))
        {
            return HealthCheckResult.Healthy("Database check skipped (concurrent check in progress)");
        }

        try
        {
            using var connection = await _connectionFactory.CreateConnectionAsync(cancellationToken);
            
            // Use simple, fast query
            const string sql = "SELECT 1";
            using var command = connection.CreateCommand();
            command.CommandText = sql;
            command.CommandTimeout = 5; // Short timeout
            
            var result = await command.ExecuteScalarAsync(cancellationToken);
            
            return result != null 
                ? HealthCheckResult.Healthy("Database connection successful")
                : HealthCheckResult.Unhealthy("Database query returned null");
        }
        catch (Exception ex)
        {
            return HealthCheckResult.Unhealthy("Database connection failed", ex);
        }
        finally
        {
            _semaphore.Release();
        }
    }
}

Connection Pool Health

Monitor connection pool health to catch resource exhaustion early:

public class ConnectionPoolHealthCheck : IHealthCheck
{
    private readonly ApplicationDbContext _context;

    public ConnectionPoolHealthCheck(ApplicationDbContext context)
    {
        _context = context;
    }

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        try
        {
            var canConnect = await _context.Database.CanConnectAsync(cancellationToken);
            
            if (!canConnect)
            {
                return HealthCheckResult.Unhealthy("Cannot connect to database");
            }

            // Check connection state
            var connectionState = _context.Database.GetDbConnection().State;
            var data = new Dictionary<string, object>
            {
                ["connection_state"] = connectionState.ToString(),
                ["database_name"] = _context.Database.GetDbConnection().Database
            };

            return HealthCheckResult.Healthy("Database connection pool healthy", data);
        }
        catch (Exception ex)
        {
            return HealthCheckResult.Unhealthy("Database connection pool check failed", ex);
        }
    }
}

Health Check Configuration

Comprehensive Configuration

builder.Services.AddHealthChecks()
    // Basic application health
    .AddCheck("self", () => HealthCheckResult.Healthy("Application is running"))
    
    // Database checks
    .AddDbContextCheck<ApplicationDbContext>(
        name: "database",
        failureStatus: HealthStatus.Unhealthy,
        tags: new[] { "db", "ready" })
    
    // External dependencies
    .AddCheck<ExternalApiHealthCheck>(
        name: "external-api",
        failureStatus: HealthStatus.Degraded,
        tags: new[] { "external" })
    
    // Memory usage check
    .AddCheck("memory", () =>
    {
        var allocatedBytes = GC.GetTotalMemory(false);
        var data = new Dictionary<string, object>
        {
            ["allocated_bytes"] = allocatedBytes,
            ["allocated_mb"] = allocatedBytes / 1024 / 1024
        };

        return allocatedBytes < 500_000_000 // 500MB threshold
            ? HealthCheckResult.Healthy("Memory usage normal", data)
            : HealthCheckResult.Degraded("High memory usage", data);
    }, tags: new[] { "memory" })
    
    // Disk space check
    .AddDiskStorageHealthCheck(options =>
    {
        options.AddDrive("C:\\", 1024); // 1GB minimum free space
    }, tags: new[] { "storage" });

// Configure different endpoints for different purposes
app.MapHealthChecks("/healthz", new HealthCheckOptions
{
    Predicate = _ => true,
    ResponseWriter = HealthCheckExtensions.WriteHealthCheckResponse
});

app.MapHealthChecks("/healthz/ready", new HealthCheckOptions
{
    Predicate = check => check.Tags.Contains("ready"),
    ResponseWriter = HealthCheckExtensions.WriteHealthCheckResponse
});

app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
    Predicate = check => !check.Tags.Contains("external")
});

Production Best Practices

Security Considerations

// Restrict health check access in production
app.MapHealthChecks("/healthz", new HealthCheckOptions
{
    ResponseWriter = HealthCheckExtensions.WriteHealthCheckResponse
}).RequireHost("*:5001"); // Only accessible on management port

// Or require authorization
app.MapHealthChecks("/healthz/detailed", new HealthCheckOptions
{
    ResponseWriter = HealthCheckExtensions.WriteHealthCheckResponse
}).RequireAuthorization("HealthCheckPolicy");

// Configure authorization policy
builder.Services.AddAuthorization(options =>
{
    options.AddPolicy("HealthCheckPolicy", policy =>
        policy.RequireRole("HealthChecker")
              .RequireAssertion(context => 
                  context.User.HasClaim("scope", "health:read")));
});

Caching and Performance

public class CachedHealthCheck : IHealthCheck
{
    private readonly IHealthCheck _innerHealthCheck;
    private readonly IMemoryCache _cache;
    private readonly TimeSpan _cacheDuration;

    public CachedHealthCheck(
        IHealthCheck innerHealthCheck, 
        IMemoryCache cache,
        TimeSpan cacheDuration)
    {
        _innerHealthCheck = innerHealthCheck;
        _cache = cache;
        _cacheDuration = cacheDuration;
    }

    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        var cacheKey = $"healthcheck_{context.Registration.Name}";
        
        if (_cache.TryGetValue(cacheKey, out HealthCheckResult? cachedResult))
        {
            return cachedResult!;
        }

        var result = await _innerHealthCheck.CheckHealthAsync(context, cancellationToken);
        
        _cache.Set(cacheKey, result, _cacheDuration);
        
        return result;
    }
}

Kubernetes Integration

Configure your Kubernetes deployment to use health checks:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  template:
    spec:
      containers:
      - name: myapp
        image: myapp:latest
        ports:
        - containerPort: 8080
        - containerPort: 8081  # Health check port
        livenessProbe:
          httpGet:
            path: /healthz/live
            port: 8081
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          httpGet:
            path: /healthz/ready
            port: 8081
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3

Monitoring and Alerting

Health Check Publisher

Publish health check results to monitoring systems:

public class PrometheusHealthCheckPublisher : IHealthCheckPublisher
{
    private static readonly Counter HealthCheckCounter = Metrics
        .CreateCounter("health_check_total", "Total health checks", new[] { "name", "status" });

    public Task PublishAsync(HealthReport report, CancellationToken cancellationToken)
    {
        foreach (var entry in report.Entries)
        {
            HealthCheckCounter
                .WithLabels(entry.Key, entry.Value.Status.ToString())
                .Inc();
        }

        return Task.CompletedTask;
    }
}

// Register the publisher
builder.Services.AddSingleton<IHealthCheckPublisher, PrometheusHealthCheckPublisher>();

builder.Services.Configure<HealthCheckPublisherOptions>(options =>
{
    options.Delay = TimeSpan.FromSeconds(30);
    options.Period = TimeSpan.FromSeconds(30);
});

Testing Health Checks

Unit Testing

[Test]
public async Task ExternalApiHealthCheck_Should_Return_Healthy_When_Api_Responds()
{
    // Arrange
    var mockHttpClient = new Mock<HttpClient>();
    var mockLogger = new Mock<ILogger<ExternalApiHealthCheck>>();
    
    var healthCheck = new ExternalApiHealthCheck(mockHttpClient.Object, mockLogger.Object);
    var context = new HealthCheckContext
    {
        Registration = new HealthCheckRegistration("test", healthCheck, null, null)
    };

    // Act
    var result = await healthCheck.CheckHealthAsync(context);

    // Assert
    Assert.That(result.Status, Is.EqualTo(HealthStatus.Healthy));
}

Common Pitfalls and Solutions

Avoid These Mistakes

  • Don't make health checks too expensive - They run frequently and can impact performance
  • Don't include business logic - Health checks should only verify infrastructure dependencies
  • Don't use the same endpoint for different purposes - Separate liveness from readiness
  • Don't ignore timeouts - Always set reasonable timeouts for external calls
  • Don't expose sensitive information - Be careful what data you include in responses

Performance Guidelines

  • Keep health checks under 1 second execution time
  • Use connection pooling for database checks
  • Cache results for expensive checks
  • Implement circuit breakers for external dependencies
  • Use appropriate failure statuses (Healthy, Degraded, Unhealthy)

Conclusion

Health checks are a critical component of resilient applications. They provide early warning of issues and enable automated recovery mechanisms. By implementing comprehensive health checks with proper separation of concerns, caching strategies, and security considerations, you can build applications that are more reliable and easier to operate in production.

Remember to start simple with basic health checks and gradually add more sophisticated monitoring as your application grows. The key is finding the right balance between comprehensive monitoring and performance impact.