Go Context in Depth: Cancellation, Timeouts, and Debugging in Production
Key Takeaways
- →Goroutine leaks pass health checks until a traffic spike pushes accumulated stacks past the container memory limit — context cancellation is your only defence against unbounded goroutine accumulation
- →Timeout budgets shrink at every layer: if parent has 100ms left and you create a 5-second child timeout, the child inherits 100ms — check deadline() before each call to avoid silent failures
- →sync.WaitGroup coordination is essential for graceful shutdown — http.Server.Shutdown doesn't wait for background workers, only in-flight requests
- →errgroup.WithContext cancels all sibling goroutines on first error, which is perfect for fan-out API aggregation but dangerous for fire-and-forget background jobs
The classic Go context production goroutine leak. A handler spawns a goroutine, returns 200, and the goroutine outlives the request because nobody propagated cancellation. Multiply by 10 thousand requests per second and the process leaks goroutines until OOMKilled, typically during the next traffic spike. We debugged this exact failure pattern on multiple production Go services — every variant traces back to the same root cause: a missing
ctxparameter or a missingdefer cancel().
Goroutine leaks are one of the most common resource-exhaustion patterns in Go microservices[Go Runtime GC]. A goroutine blocked on a database or network call consumes a stack (small initial stack, grown dynamically)[Go context], holds connections in whatever pool it is waiting on, and accumulates silently. The process continues to respond to health checks until memory or file descriptor limits are hit — typically during the next traffic spike. The root cause is always the same: work that nobody cares about anymore keeps running because there is no signal to stop it.
context.Context[Go Language Specification] is that signal.
Context carries cancellation signals through your call chain. When a parent context is cancelled — because a deadline expired, a client disconnected, or you explicitly called cancel() — all child contexts are cancelled too. Always pass context as the first parameter, always defer cancel(), and use errgroup for fan-out. This prevents goroutine leaks and ensures graceful shutdown.
- Pass
r.Context()to all HTTP handlers; derive child contexts withWithTimeout,WithCancel, orWithDeadline - Always
defer cancel()after everycontext.WithTimeoutorcontext.WithCancel - Use
golang.org/x/sync/errgroupfor fan-out; it cancels all goroutines on first error
graph TD
Root[context.Background<br/>process root] --> Req[r.Context<br/>HTTP request]
Req -->|cancelled when client disconnects| Req
Req --> WT[WithTimeout 2s]
Req --> WC[WithCancel]
Req --> WD[WithDeadline]
WT --> Child1[DB call goroutine]
WC --> Child2[Stream goroutine]
WD --> Child3[Outbound RPC goroutine]
Cancel{cancel signal} -.->|propagates down| WT
Cancel -.->|propagates down| WC
Cancel -.->|propagates down| WD
Survive[WithoutCancel<br/>Go 1.21+] -.->|fire-and-forget audit| AuditTask[Saves payment after client disconnect]
style Cancel fill:#fee
style Root fill:#eef
style Survive fill:#efe
The diagram is the context tree: every derived context inherits cancel propagation from its parent, except WithoutCancel (Go 1.21+) which keeps a child alive past the parent's death — useful for fire-and-forget audit/persistence work that should outlive the request that started it.
The Quick Start: Context Propagation Patterns
Every context.Context is part of a tree. The root is context.Background(). When you call WithTimeout, WithCancel, or WithDeadline, you create a child context. When a parent is cancelled, all descendants are cancelled too — immediately.
| Constructor | Use When | Pattern |
|---|---|---|
context.Background() | Program startup, background workers, operations outliving a request | runServer(context.Background()) |
context.WithTimeout(parent, d) | Relative timeout from now | ctx, cancel := context.WithTimeout(ctx, 2*time.Second); defer cancel() |
context.WithDeadline(parent, t) | Absolute deadline | ctx, cancel := context.WithDeadline(ctx, time.Now().Add(2*time.Second)); defer cancel() |
context.WithCancel(parent) | Manual cancellation, no time limit | ctx, cancel := context.WithCancel(ctx); defer cancel() |
context.WithoutCancel(parent) | Child survives parent cancellation but loses the parent's deadline (Go 1.21+) | auditCtx := context.WithoutCancel(ctx) for fire-and-forget cleanup |
In an HTTP server, r.Context() is the entry point — it's cancelled when the client disconnects, the server shuts down, or the handler returns. Pass it down your call chain (and detach with WithoutCancel for any goroutine meant to outlive the handler):
func (s *PaymentService) ProcessPayment(w http.ResponseWriter, r *http.Request) {
var req PaymentRequest
json.NewDecoder(r.Body).Decode(&req)
// r.Context() is cancelled the instant this handler returns (below), so a
// fire-and-forget goroutine must not use it directly. WithoutCancel (Go 1.21+)
// detaches from that cancellation while keeping request-scoped values (trace
// IDs); add an explicit timeout since it also drops the parent deadline.
bgCtx := context.WithoutCancel(r.Context())
go func() {
ctx, cancel := context.WithTimeout(bgCtx, 10*time.Second)
defer cancel()
result, err := s.gateway.ChargeWithContext(ctx, req.CardToken, req.Amount)
if err != nil {
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return
}
log.Error("payment failed", "error", err)
return
}
s.db.SavePaymentWithContext(ctx, result)
}()
w.WriteHeader(http.StatusAccepted)
json.NewEncoder(w).Encode(map[string]string{"status": "processing"})
}Timeout Budgets: Children Can Only Shrink
[Go context]The deadline propagation tree — every layer subtracts from the parent's budget, never extends it:
graph TD
Client[Client request<br/>3000 ms timeout<br/>Server.ReadHeaderTimeout caps it] --> Handler[HTTP handler<br/>r.Context: 3000 ms left]
Handler -->|WithTimeout 1500 ms| Service[Service.PlaceOrder<br/>1500 ms left<br/>middle reserves 1500 ms<br/>for downstream + slack]
Service -->|WithTimeout 800 ms| DB[Repository.Save<br/>800 ms left]
Service -->|WithTimeout 500 ms| Pay[Gateway.Charge<br/>500 ms left]
Service -->|WithTimeout 400 ms| Inv[Inventory.Reserve<br/>400 ms left]
DB --> Pool{DB pool acquire<br/>+ tx + commit}
Pay --> HTTP{Outbound HTTP<br/>+ TLS + body read}
Inv --> RPC{gRPC call}
Pool -.->|exceeds 800 ms| Cancel[ctx.Err == DeadlineExceeded<br/>tx rolls back<br/>conn returned to pool]
HTTP -.->|exceeds 500 ms| Cancel
RPC -.->|exceeds 400 ms| Cancel
style Cancel fill:#fdd
style DB fill:#dfd
style Pay fill:#dfd
style Inv fill:#dfd
The discipline: each layer's timeout sums to less than the parent's remaining budget, leaving slack for serialization, lock acquisition, and the layer's own response time. Children inherit the minimum of all ancestor deadlines.
Child contexts inherit the parent's deadline and can never extend it. If the parent has 1 second remaining and you create a child with WithTimeout(ctx, 5*time.Second), the child gets 1 second — not 5.
Always check the remaining budget before setting child timeouts:
func (s *OrderService) PlaceOrder(ctx context.Context, order Order) error {
// Check how much time the parent has left
if deadline, ok := ctx.Deadline(); ok {
remaining := time.Until(deadline)
if remaining < 100*time.Millisecond {
return fmt.Errorf("insufficient time budget: %v remaining", remaining)
}
}
// Each step gets a fraction of the remaining budget
inventoryCtx, cancel := context.WithTimeout(ctx, 500*time.Millisecond)
defer cancel()
if err := s.inventory.Reserve(inventoryCtx, order); err != nil {
return fmt.Errorf("reserve inventory: %w", err)
}
paymentCtx, cancel := context.WithTimeout(ctx, 2*time.Second)
defer cancel()
if err := s.payment.Charge(paymentCtx, order); err != nil {
return fmt.Errorf("charge payment: %w", err)
}
return nil
}Forgetting defer cancel() after context.WithTimeout or context.WithCancel leaks the internal timer goroutine and
associated memory until the parent context is cancelled. This is flagged by go vet's lostcancel check — always pair every
WithTimeout or WithCancel with an immediate defer cancel().
Database and HTTP Calls: Always Use Context Variants
[Go net/http]All blocking I/O in the standard library accepts context. Use QueryRowContext, ExecContext, http.NewRequestWithContext:
func (r *UserRepository) FindByID(ctx context.Context, id string) (*User, error) {
var user User
err := r.db.QueryRowContext(ctx,
"SELECT id, name, email FROM users WHERE id = $1", id,
).Scan(&user.ID, &user.Name, &user.Email)
if err != nil {
if errors.Is(err, context.DeadlineExceeded) {
return nil, fmt.Errorf("database query timed out: %w", err)
}
if errors.Is(err, sql.ErrNoRows) {
return nil, ErrNotFound
}
return nil, err
}
return &user, nil
}
func (h *UserHandler) GetUser(w http.ResponseWriter, r *http.Request) {
// 500ms budget for the entire handler, including DB round-trip
ctx, cancel := context.WithTimeout(r.Context(), 500*time.Millisecond)
defer cancel()
userID := r.URL.Query().Get("id")
user, err := h.userRepo.FindByID(ctx, userID)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
json.NewEncoder(w).Encode(user)
}When the 500ms deadline expires, QueryRowContext returns immediately with context.DeadlineExceeded.
Fan-Out Without Data Races
A common pattern is calling several downstream services in parallel and combining the results. For long-lived streaming workloads, a worker pool with bounded concurrency is often a better fit than ad-hoc fan-out. For request-scoped fan-out, a naive implementation using a shared struct and an error channel has a data race: goroutines write to different fields of the same struct concurrently, which is undefined behavior in Go's memory model.
Use golang.org/x/sync/errgroup, which manages goroutine lifecycle, cancels the group on the first error, and provides a clean model for aggregating results safely:
import "golang.org/x/sync/errgroup"
type Product struct {
ID string
Name string
Price float64
Reviews []Review
Inventory int
Recommendations []string
}
func (s *ProductService) GetProductDetails(ctx context.Context, productID string) (*Product, error) {
ctx, cancel := context.WithTimeout(ctx, 2*time.Second)
defer cancel()
var (
product Product
mu sync.Mutex // protects all writes to product
)
g, ctx := errgroup.WithContext(ctx)
g.Go(func() error {
info, err := s.productInfoService.Get(ctx, productID)
if err != nil {
return fmt.Errorf("product info: %w", err)
}
mu.Lock()
product.ID = info.ID
product.Name = info.Name
product.Price = info.Price
mu.Unlock()
return nil
})
g.Go(func() error {
reviews, err := s.reviewService.GetReviews(ctx, productID)
if err != nil {
return fmt.Errorf("reviews: %w", err)
}
mu.Lock()
product.Reviews = reviews
mu.Unlock()
return nil
})
g.Go(func() error {
inventory, err := s.inventoryService.GetStock(ctx, productID)
if err != nil {
return fmt.Errorf("inventory: %w", err)
}
mu.Lock()
product.Inventory = inventory
mu.Unlock()
return nil
})
g.Go(func() error {
recs, err := s.recommendationService.Get(ctx, productID)
if err != nil {
return fmt.Errorf("recommendations: %w", err)
}
mu.Lock()
product.Recommendations = recs
mu.Unlock()
return nil
})
if err := g.Wait(); err != nil {
return nil, err
}
return &product, nil
}When errgroup is initialised with errgroup.WithContext(ctx), the first goroutine to return a non-nil error cancels the derived context — which cancels all other in-flight calls automatically. The Wait() call returns the first error encountered.
If you want all goroutines to complete even when one fails (partial results), do not use errgroup.WithContext. Use plain goroutines and collect results through typed channels.
Distributed Tracing with Context
[OpenTelemetry Sampling]In distributed systems, context.Context carries trace IDs across service boundaries through OpenTelemetry. When you pass ctx to an instrumented HTTP client, it automatically injects the trace ID into HTTP headers. The receiving service extracts and continues the trace — all through context propagation.
gRPC is even simpler: deadlines propagate automatically. A client-side 2-second timeout flows to the server, which can check the remaining budget:
// Client: deadline propagates to server automatically
ctx, cancel := context.WithTimeout(ctx, 2*time.Second)
defer cancel()
resp, err := client.GetUser(ctx, &pb.GetUserRequest{Id: userID})
// Server: access propagated deadline
func (s *server) GetUser(ctx context.Context, req *pb.GetUserRequest) (*pb.User, error) {
deadline, ok := ctx.Deadline()
if ok {
remaining := time.Until(deadline)
slog.Debug("deadline from upstream", "remaining", remaining)
}
return s.repo.FindByID(ctx, req.Id)
}When service A calls service B with 5 seconds remaining, and B calls service C, C inherits the remaining time from A's
original deadline — not 5 fresh seconds. If A already consumed 4 seconds, C gets 1 second. Monitor
grpc_server_handling_seconds with deadline_exceeded labels to catch cascading timeout pressure.
Graceful Shutdown and Long-Running Work
Context cancellation enables clean shutdown. When a signal arrives, cancel the root context — everything descending from it stops:
func main() {
ctx, cancel := context.WithCancel(context.Background())
server := &http.Server{Addr: ":8080", Handler: NewHandler(ctx)}
go server.ListenAndServe()
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
<-sigChan
cancel() // Signal all in-flight work to stop
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 30*time.Second)
defer shutdownCancel()
server.Shutdown(shutdownCtx)
}For CPU-bound loops, poll ctx.Err() periodically (not every iteration):
func processLargeDataset(ctx context.Context, items []Item) error {
for i, item := range items {
if i%100 == 0 {
if err := ctx.Err(); err != nil {
return fmt.Errorf("cancelled at item %d: %w", i, err)
}
}
if err := processItem(ctx, item); err != nil {
return err
}
}
return nil
}Context Values: Request Metadata Only
Use context.WithValue only for request-scoped metadata: trace IDs, request IDs, user IDs. Always use unexported key types to prevent collisions:
type contextKey string
const requestIDKey contextKey = "request-id"
func RequestIDMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
requestID := r.Header.Get("X-Request-ID")
if requestID == "" {
requestID = uuid.New().String()
}
ctx := context.WithValue(r.Context(), requestIDKey, requestID)
w.Header().Set("X-Request-ID", requestID)
next.ServeHTTP(w, r.WithContext(ctx))
})
}Never store configuration, database connections, or service dependencies in context values — it hides dependencies and makes code untestable. Pass those as function parameters or struct fields. Context values are immutable; every WithValue call creates a linked list node, and lookups scan the chain.
Detecting and Testing Leaks
Goroutine leaks are silent until they exhaust resources. Use go.uber.org/goleak to detect them automatically:
import "go.uber.org/goleak"
func TestMain(m *testing.M) {
goleak.VerifyTestMain(m)
}
func TestOrderProcessing(t *testing.T) {
defer goleak.VerifyNone(t)
ctx, cancel := context.WithCancel(context.Background())
svc := NewOrderService(mockDeps)
svc.ProcessOrder(ctx, testOrder)
cancel()
// goleak.VerifyNone will fail if any goroutines are still running
}Test timeout and cancellation paths without timing-dependent sleeps:
func TestUserService_FetchWithTimeout(t *testing.T) {
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
slowMock := &SlowUserAPI{delay: 500 * time.Millisecond}
service := NewUserService(slowMock)
_, err := service.FetchUser(ctx, "user-123")
if !errors.Is(err, context.DeadlineExceeded) {
t.Errorf("expected deadline exceeded, got: %v", err)
}
}In production, expose pprof on an internal port to inspect goroutine profiles. A healthy service with 50 concurrent connections typically has <200 goroutines. Common leak patterns: goroutines in chan receive without a select, missing defer cancel(), and time.After loops (use time.NewTimer with Reset instead).
Production Checklist
Before shipping a Go service:
- All HTTP handlers derive context from
r.Context() - All database calls use
QueryContext/ExecContextvariants - All HTTP client requests use
http.NewRequestWithContext - Every
context.WithTimeoutorcontext.WithCancelpaired withdefer cancel() - Fan-out uses
golang.org/x/sync/errgroupor typed channels - Long CPU-bound loops call
ctx.Err()at regular intervals - Context values use unexported key types, carry only metadata (trace IDs, user IDs)
- Background fire-and-forget operations use
context.WithoutCancel(ctx)to survive client disconnect (note: this also removes the parent deadline — add your owncontext.WithTimeoutif needed) - Tests use
goleak.VerifyTestMainto catch goroutine leaks in CI - pprof exposed on internal port for production profiling
- Graceful shutdown cancels the root context and waits with a separate deadline
Postmortem: The Cancellation Leak That Took Out Checkout
A real production incident from a checkout service running on Kubernetes. The deployment had four pods, each capped at 512 MiB. Steady-state traffic was 800 requests per second with a p99 latency of 70 ms. The service passed every health check, every readiness probe, and every smoke test for forty minutes after a routine deploy. Then the first pod was OOMKilled. Within ninety seconds, all four pods cycled and the cart endpoint started returning 503 from the load balancer. The on-call pulled a goroutine profile from a survivor and counted forty-one thousand goroutines parked in chan receive, all rooted in the same handler.
The leaked code was a "fire-and-forget" recommendation prefetch. The handler kicked off a goroutine on every cart view to warm a downstream recommendations cache. The goroutine called the recommendation service over HTTP without using the request context, and it read the response body in a loop guarded only by a hardcoded ten-second sleep between retries. When the recommendation service degraded from 30 ms to 9 seconds per call, every prefetch goroutine parked for the full retry window. At 800 requests per second that meant 800 new goroutines per second piling up against a 60 to 90 second drain time. The container memory headroom evaporated in roughly six minutes once the upstream slowed.
The fix had three pieces, all of which should have been there from the start. First, the prefetch derived its own bounded child context from the request context using WithTimeout so an upstream slowdown could not pin the goroutine indefinitely. Second, the retry loop selected on ctx.Done() instead of sleeping unconditionally, so cancellation took effect within milliseconds. Third, the handler used a semaphore to cap concurrent prefetches per pod, so even with cancellation working correctly the service degraded gracefully under upstream pressure rather than fanning out to unbounded goroutines.
type recommendationPrefetcher struct {
client *http.Client
sem chan struct{} // bounded concurrency, e.g. make(chan struct{}, 256)
}
func (p *recommendationPrefetcher) Prefetch(parent context.Context, userID string) {
select {
case p.sem <- struct{}{}:
default:
// Shed load: skip prefetch when at concurrency cap
return
}
ctx, cancel := context.WithTimeout(parent, 800*time.Millisecond)
go func() {
defer cancel()
defer func() { <-p.sem }()
backoff := 50 * time.Millisecond
for attempt := 0; attempt < 3; attempt++ {
req, _ := http.NewRequestWithContext(ctx, http.MethodGet,
"https://recs.internal/v1/users/"+userID, nil)
resp, err := p.client.Do(req)
if err == nil {
resp.Body.Close()
return
}
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return
}
select {
case <-ctx.Done():
return
case <-time.After(backoff):
backoff *= 2
}
}
}()
}The lesson generalises. Any goroutine that outlives its triggering request must (a) derive a bounded context, (b) select on ctx.Done() in every wait, and (c) cap its own concurrency. Two of the three on their own do not save you when an upstream goes slow.
Testing Context-Aware Code Without Sleeps
Production context bugs hide in two places: code paths that ignore cancellation, and code paths that only fire on cancellation (cleanup handlers, retry loops, partial-result paths). Table-driven tests with explicit WithCancel give you deterministic coverage of both branches without time.Sleep calls that flake under CPU pressure.
func TestOrderService_PlaceOrder_Cancellation(t *testing.T) {
cases := []struct {
name string
setupCtx func(t *testing.T) (context.Context, context.CancelFunc)
repoLatency time.Duration
wantErr error
wantSaveCall bool
}{
{
name: "happy path: context not cancelled",
setupCtx: func(t *testing.T) (context.Context, context.CancelFunc) {
return context.WithCancel(context.Background())
},
repoLatency: 10 * time.Millisecond,
wantErr: nil,
wantSaveCall: true,
},
{
name: "cancel before call: short-circuits without touching repo",
setupCtx: func(t *testing.T) (context.Context, context.CancelFunc) {
ctx, cancel := context.WithCancel(context.Background())
cancel() // already cancelled
return ctx, func() {}
},
wantErr: context.Canceled,
wantSaveCall: false,
},
{
name: "deadline exceeded mid-call: returns DeadlineExceeded",
setupCtx: func(t *testing.T) (context.Context, context.CancelFunc) {
return context.WithTimeout(context.Background(), 5*time.Millisecond)
},
repoLatency: 100 * time.Millisecond,
wantErr: context.DeadlineExceeded,
wantSaveCall: false,
},
}
for _, tc := range cases {
t.Run(tc.name, func(t *testing.T) {
defer goleak.VerifyNone(t)
ctx, cancel := tc.setupCtx(t)
defer cancel()
repo := &fakeRepo{latency: tc.repoLatency}
svc := NewOrderService(repo)
err := svc.PlaceOrder(ctx, Order{ID: "o-1"})
if !errors.Is(err, tc.wantErr) {
t.Fatalf("err mismatch: got %v, want %v", err, tc.wantErr)
}
if got := repo.saveCalled.Load(); got != tc.wantSaveCall {
t.Fatalf("save called: got %v, want %v", got, tc.wantSaveCall)
}
})
}
}Three properties worth highlighting. The setupCtx factory returns a fresh context per case so cases cannot leak state. The pre-cancelled case uses cancel() immediately rather than a tiny timeout, which removes scheduler dependence. And goleak.VerifyNone(t) runs at the end of every case, catching the regression where a buggy fix accepts cancellation but forgets to release a worker goroutine.
errgroup vs sync.WaitGroup: The Decision Rule
Both primitives coordinate goroutine lifetime. They are not interchangeable. Pick errgroup.WithContext when any sibling failure should abort the others — the canonical case is a fan-out aggregation where a partial result is worse than a fast failure (an order page where the price service fails, so showing reviews and inventory alone is misleading). Pick a plain sync.WaitGroup when each goroutine's outcome is independent and you want every result, even if some fail — for example, writing audit events to several sinks where a flaky logger should not block the others.
| Question | errgroup.WithContext | sync.WaitGroup |
|---|---|---|
| Should one failure cancel the others? | Yes | No |
| Do you need the first error returned? | Yes (g.Wait() returns it) | Roll your own with a results slice |
| Is partial success acceptable? | No | Yes |
| Are the goroutines bounded by request scope? | Almost always | Sometimes (background workers) |
Do you call defer cancel() on the parent ctx? | Yes (the derived ctx) | Not applicable |
The trap is using errgroup.WithContext for fire-and-forget background work. The first error cancels all siblings, which is exactly wrong when each sibling is meant to run independently to completion. If you find yourself wrapping every g.Go body in a defer recover to swallow errors so they do not abort the group, that is a signal to switch to sync.WaitGroup plus an explicit error slice.
Frequently Asked Questions
How does context cancellation prevent goroutine leaks in Go?
When a parent context is cancelled, all derived child contexts are cancelled too, signaling goroutines watching ctx.Done() to stop work and exit. Without this signal, goroutines blocked on I/O accumulate indefinitely until the process runs out of memory.
Should you use context.Background() or context.TODO() for background work in Go?
Use context.Background() for intentionally long-lived work that should outlive the request (e.g., saving a payment after the client disconnects). Use context.TODO() as a placeholder when you plan to add proper context propagation later.
What is the difference between context.WithTimeout and context.WithDeadline?
WithTimeout sets a relative duration from now (e.g., 5 seconds), while WithDeadline sets an absolute wall-clock time. WithTimeout is syntactic sugar — it calls WithDeadline(parent, time.Now().Add(timeout)) internally.
How do you detect goroutine leaks in a Go service?
Export the runtime.NumGoroutine() metric to your monitoring system and alert on sustained growth. In development, use go test -count=1 -race with goleak to detect goroutines that outlive the test.
Keep Reading
- Go Graceful HTTP Shutdown: Zero-Downtime Production Patterns — Signal handling, two-phase drain, and Kubernetes probe coordination that builds on root context cancellation
- Go Worker Pool Pattern: Production-Ready Concurrency Control — Bounded concurrency with context-aware workers, backpressure, and graceful drain
- Building Resilient Distributed Systems with Go — Circuit breakers, bulkheads, and timeout propagation patterns that depend on correct context usage
Engineering Team
A multidisciplinary team of backend engineers, architects, and DevOps practitioners shipping deep dives into distributed systems and production infrastructure.
Read Next
Go context.Context Cheat Sheet: Cancellation, Timeouts & Gotchas
Go context.Context: constructors, cancellation, deadlines, request values, and five goroutine leak patterns in production.
Go Dynamic JSON: Parsing Unknown Schemas in Production
Handle unpredictable JSON in Go: map[string]any, json.RawMessage, type switches, and defensive patterns for shifting schemas.
Go Graceful HTTP Shutdown: Zero-Downtime Production Patterns
Go graceful shutdown: SIGTERM handling, health probe coordination, and Kubernetes drain patterns for zero dropped requests.