Go Distributed Systems Resilience Circuit Breaker

Building Resilient Distributed Systems with Go

System Architecture Group•Jan 15, 2025

12 min read

Building Resilient Distributed Systems with Go

Introduction

Distributed systems are inherently complex. Network partitions, node failures, and race conditions are not edge cases -- they are the norm. In this article, we will explore how Go's concurrency model makes it uniquely suited for building resilient backend services.

The Circuit Breaker Pattern

The circuit breaker pattern prevents cascading failures by wrapping calls to external services in a stateful proxy that monitors for failures and short-circuits requests when the failure rate exceeds a threshold.

Pro Tip: Always set your circuit breaker timeout higher than your service's P99 latency to avoid false positives during normal load spikes.

stateDiagram-v2
    [*] --> Closed
    
    Closed --> Open: Error Threshold Exceeded
    Open --> HalfOpen: Sleep Window Expired
    
    HalfOpen --> Closed: Success (Probe OK)
    HalfOpen --> Open: Failure (Probe Failed)
    
    note right of Closed: Normal State (Traffic flows freely)
    
    note right of Open: Fail Fast (Requests rejected immediately)
    
    note right of HalfOpen: Testing Recovery (Limited functionality)

type CircuitBreaker struct {
    mu          sync.RWMutex
    state       State
    failCount   int
    threshold   int
    timeout     time.Duration
    lastFailure time.Time
}
 
func (cb *CircuitBreaker) Execute(fn func() error) error {
    cb.mu.RLock()
    if cb.state == StateOpen {
        if time.Since(cb.lastFailure) > cb.timeout {
            cb.mu.RUnlock()
            cb.mu.Lock()
            cb.state = StateHalfOpen
            cb.mu.Unlock()
        } else {
            cb.mu.RUnlock()
            return ErrCircuitOpen
        }
    } else {
        cb.mu.RUnlock()
    }
 
    err := fn()
    if err != nil {
        cb.recordFailure()
        return err
    }
    cb.recordSuccess()
    return nil
}

Graceful Degradation

When a dependency fails, your service should degrade gracefully rather than failing entirely. This means returning cached data, default values, or partial responses instead of errors. The key insight is that partial availability is almost always better than total unavailability.

Consider a product page that fetches data from multiple microservices: product details, reviews, recommendations, and pricing. If the recommendations service is down, you should still show the product with its reviews and pricing, perhaps with a generic "popular items" fallback for recommendations.

Retry Strategies

Implementing retries with exponential backoff and jitter prevents thundering herd problems when services recover from failures.

func RetryWithBackoff(ctx context.Context, maxRetries int, fn func() error) error {
    for i := 0; i < maxRetries; i++ {
        err := fn()
        if err == nil {
            return nil
        }
        base := time.Duration(1<<uint(i)) * 100 * time.Millisecond
        jitter := time.Duration(rand.Int63n(int64(base / 2)))
        select {
        case <-ctx.Done():
            return ctx.Err()
        case <-time.After(base + jitter):
        }
    }
    return fmt.Errorf("max retries (%d) exceeded", maxRetries)
}

The jitter component is crucial. Without it, all clients retry at exactly the same time, creating a synchronized burst that can overwhelm the recovering service. Jitter spreads retries across time, giving the service a chance to recover gradually.

sequenceDiagram
    participant Client
    participant Service
    
    Client->>Service: Request 1 (Fail)
    Service-->>Client: 503 Internal Error
    Note over Client: Backoff 100ms
    
    Client->>Service: Retry 1 (Fail)
    Service-->>Client: 503 Internal Error
    Note over Client: Backoff 200ms + Jitter
    
    Client->>Service: Retry 2 (Success)
    Service-->>Client: 200 OK

Health Checks and Readiness Probes

Proper health checks allow your orchestrator (Kubernetes, Nomad, etc.) to make informed decisions about routing traffic and restarting unhealthy instances. A liveness probe checks whether the process is alive, while a readiness probe checks whether it is ready to accept traffic.

func (s *Server) healthHandler(w http.ResponseWriter, r *http.Request) {
    checks := map[string]error{
        "database": s.db.Ping(r.Context()),
        "cache":    s.cache.Ping(r.Context()),
        "queue":    s.queue.Ping(r.Context()),
    }
    healthy := true
    for _, err := range checks {
        if err != nil {
            healthy = false
            break
        }
    }
    if healthy {
        w.WriteHeader(http.StatusOK)
    } else {
        w.WriteHeader(http.StatusServiceUnavailable)
    }
    json.NewEncoder(w).Encode(checks)
}

Conclusion

Building resilient distributed systems requires thinking about failure from the start. Go provides excellent primitives for concurrency and error handling that make it a natural fit for this domain. Combine circuit breakers, graceful degradation, and smart retry strategies to build services that withstand real-world conditions.

BackendBytes Architect Team

System Architecture Group

Experts in distributed systems, scalability, and high-performance computing.

Building Resilient Distributed Systems with Go

BackendBytes Architect Team

System Architecture Group•Jan 15, 2025

12 min read

Introduction

The Circuit Breaker Pattern

Pro Tip: Always set your circuit breaker timeout higher than your service's P99 latency to avoid false positives during normal load spikes.

stateDiagram-v2
    [*] --> Closed
    
    Closed --> Open: Error Threshold Exceeded
    Open --> HalfOpen: Sleep Window Expired
    
    HalfOpen --> Closed: Success (Probe OK)
    HalfOpen --> Open: Failure (Probe Failed)
    
    note right of Closed: Normal State (Traffic flows freely)
    
    note right of Open: Fail Fast (Requests rejected immediately)
    
    note right of HalfOpen: Testing Recovery (Limited functionality)

type CircuitBreaker struct {
    mu          sync.RWMutex
    state       State
    failCount   int
    threshold   int
    timeout     time.Duration
    lastFailure time.Time
}
 
func (cb *CircuitBreaker) Execute(fn func() error) error {
    cb.mu.RLock()
    if cb.state == StateOpen {
        if time.Since(cb.lastFailure) > cb.timeout {
            cb.mu.RUnlock()
            cb.mu.Lock()
            cb.state = StateHalfOpen
            cb.mu.Unlock()
        } else {
            cb.mu.RUnlock()
            return ErrCircuitOpen
        }
    } else {
        cb.mu.RUnlock()
    }
 
    err := fn()
    if err != nil {
        cb.recordFailure()
        return err
    }
    cb.recordSuccess()
    return nil
}

Graceful Degradation

Retry Strategies

Implementing retries with exponential backoff and jitter prevents thundering herd problems when services recover from failures.

func RetryWithBackoff(ctx context.Context, maxRetries int, fn func() error) error {
    for i := 0; i < maxRetries; i++ {
        err := fn()
        if err == nil {
            return nil
        }
        base := time.Duration(1<<uint(i)) * 100 * time.Millisecond
        jitter := time.Duration(rand.Int63n(int64(base / 2)))
        select {
        case <-ctx.Done():
            return ctx.Err()
        case <-time.After(base + jitter):
        }
    }
    return fmt.Errorf("max retries (%d) exceeded", maxRetries)
}

sequenceDiagram
    participant Client
    participant Service
    
    Client->>Service: Request 1 (Fail)
    Service-->>Client: 503 Internal Error
    Note over Client: Backoff 100ms
    
    Client->>Service: Retry 1 (Fail)
    Service-->>Client: 503 Internal Error
    Note over Client: Backoff 200ms + Jitter
    
    Client->>Service: Retry 2 (Success)
    Service-->>Client: 200 OK

Health Checks and Readiness Probes

func (s *Server) healthHandler(w http.ResponseWriter, r *http.Request) {
    checks := map[string]error{
        "database": s.db.Ping(r.Context()),
        "cache":    s.cache.Ping(r.Context()),
        "queue":    s.queue.Ping(r.Context()),
    }
    healthy := true
    for _, err := range checks {
        if err != nil {
            healthy = false
            break
        }
    }
    if healthy {
        w.WriteHeader(http.StatusOK)
    } else {
        w.WriteHeader(http.StatusServiceUnavailable)
    }
    json.NewEncoder(w).Encode(checks)
}

Conclusion

BackendBytes Architect Team

System Architecture Group

Experts in distributed systems, scalability, and high-performance computing.

Building Resilient Distributed Systems with Go

Introduction

The Circuit Breaker Pattern

Graceful Degradation

Retry Strategies

Health Checks and Readiness Probes

Conclusion

Read Next

Understanding Raft Consensus: Building Blocks of Distributed Databases

Caching Strategies at Scale: Beyond Simple Key-Value Stores

PostgreSQL vs MongoDB: The definitive guide for 2024

Understanding Raft Consensus: Building Blocks of Distributed Databases

Caching Strategies at Scale: Beyond Simple Key-Value Stores

PostgreSQL vs MongoDB: The definitive guide for 2024

Building Resilient Distributed Systems with Go

Introduction

The Circuit Breaker Pattern

Graceful Degradation

Retry Strategies

Health Checks and Readiness Probes

Conclusion

Read Next

Understanding Raft Consensus: Building Blocks of Distributed Databases

Caching Strategies at Scale: Beyond Simple Key-Value Stores

PostgreSQL vs MongoDB: The definitive guide for 2024

Understanding Raft Consensus: Building Blocks of Distributed Databases

Caching Strategies at Scale: Beyond Simple Key-Value Stores

PostgreSQL vs MongoDB: The definitive guide for 2024