Skip to content

Go context.Context Cheat Sheet: Cancellation, Timeouts & Gotchas

BackendBytes Engineering Team
BackendBytes Engineering Team
4 min read
Go context.Context Cheat Sheet: Cancellation, Timeouts & Gotchas

Key Takeaways

  • Every WithCancel/WithTimeout without defer cancel() leaks a goroutine per request — missed cancellations compound at scale until the service OOMs
  • Child contexts inherit the parent deadline and can never extend it — a 5-second WithTimeout on a 1-second parent budget always gets 1 second, not 5
  • Request-scoped values require private unexported key types, not strings — exported keys collide across packages, creating silent data corruption bugs
  • Never create a fresh Background() with WithTimeout inside a request handler — you orphan the http.Request.Context() and lose client disconnect signals

The classic Go context production leak. A handler spawns a goroutine, returns a 200, and the goroutine outlives the request because nobody called defer cancel(). Multiply by 10 thousand requests per second and the process leaks goroutines until OOMKilled. We debugged this exact pattern on multiple production services — it is the single most common shape of goroutine leak.

The five constructors you need

[Go context]

Every context.Context tree starts with one of two roots. Everything else is a child wrapper.

Key Points

Background() is your tree root. WithCancel, WithTimeout, and WithDeadline leak without defer cancel(). Never pass nil, store context in a struct, or use Value for function arguments.

  • Always pair WithCancel/Timeout/Deadline with defer cancel() to prevent leaks
  • Use WithTimeout for RPC calls; WithDeadline for splitting parent budgets
  • Pass context as the first parameter; never cache it in a struct
graph TD
    BG["context.Background()"] --> WC["WithCancel(bg)"]
    BG --> WT["WithTimeout(bg, 5s)"]
    WC --> WV["WithValue(wc, key, val)"]
    WC -->|"spawns"| G1["goroutine A<br/>← ctx.Done()"]
    WT -->|"spawns"| G2["goroutine B<br/>← ctx.Done()"]
    WV -->|"spawns"| G3["goroutine C<br/>ctx.Value(key)"]

Constructor reference

ctx := context.Background()                                  // root, never cancelled
ctx := context.TODO()                                        // placeholder, don't ship
ctx, cancel := context.WithCancel(parent)                    // manual cancellation
ctx, cancel := context.WithTimeout(parent, 2*time.Second)    // relative deadline
ctx, cancel := context.WithDeadline(parent, absoluteTime)    // absolute deadline
ctx := context.WithValue(parent, key, val)                   // request-scoped value

Background() is the root in main, tests, and background workers. TODO() signals refactoring—never deploy it. All cancel-returning constructors must have defer cancel() on the next line. Missed cancel() leaks a goroutine per request.

How cancellation propagates

A context is a tree. When a parent cancels, every descendant sees Done() close simultaneously — one signal fans out across every goroutine that branched off.

graph TD
    A["handler<br/>Background()"] --> B["WithTimeout 2s<br/>client.Get()"]
    A --> C["WithCancel<br/>spawner"]
    C --> D["db.Query()"]
    C --> E["cache.Set()"]
    C --> F["publish()"]
    B -. deadline fires .-> X(("Done() closes<br/>this subtree"))
    C -. cancel() called .-> Y(("Done() closes<br/>all 3 goroutines"))

Every leaf goroutine must watch <-ctx.Done() alongside its real work. Otherwise cancellation can't reach it and the goroutine leaks.

Cancellation and timeouts

Use WithCancel for explicit cancellation. Use WithTimeout for RPC calls (2s budget). Use WithDeadline when splitting a parent budget across sequential calls:

ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
defer cancel()
resp, err := client.Get(ctx, "/orders/42")
if errors.Is(err, context.DeadlineExceeded) {
    return http.StatusGatewayTimeout, nil
}

We've debugged this mistake in production: never create a fresh context.Background() with WithTimeout inside a request handler—you orphan the request context and lose client disconnect signals.

Request-scoped values

WithValue stores per-request data (request IDs, trace spans, user) across API boundaries. Not for function arguments.

type ctxKey int
const requestIDKey ctxKey = iota
 
func WithRequestID(ctx context.Context, id string) context.Context {
    return context.WithValue(ctx, requestIDKey, id)
}
 
func RequestID(ctx context.Context) string {
    v, _ := ctx.Value(requestIDKey).(string)
    return v
}

Always use a private unexported key type (not string). Exported keys collide. Type-assert; never assume presence.

HTTP and gRPC integration

[Go net/http]

HTTP handlers get a request context via r.Context(). Wrap it with WithTimeout for your I/O budget:

func handler(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 500*time.Millisecond)
    defer cancel()
    rows, err := db.QueryContext(ctx, "SELECT id FROM orders WHERE user_id=$1", userID)
    if err != nil {
        http.Error(w, err.Error(), http.StatusInternalServerError)
        return
    }
    defer rows.Close()
}

For outbound HTTP, use http.NewRequestWithContext(ctx, ...). For gRPC, methods take ctx as the first parameter. Deadlines propagate as grpc-timeout headers automatically.

Common gotchas

  • Store context in a struct — Contexts live per-call, not per-object[Go context]. Pass it explicitly to every method.
  • Pass nil context — Causes a panic. Use context.TODO() as a placeholder.
  • Forget defer cancel()WithCancel, WithTimeout, WithDeadline leak if not cleaned up[Go context].
  • Use Value for optional arguments — Unreadable and untyped. Pass as a regular function parameter.
  • Ignore ctx.Err() after <-ctx.Done()Done() fires for both cancellation and timeout. Call ctx.Err() to distinguish: context.Canceled → 499 (client gone), DeadlineExceeded → 504 (too slow).

Detecting context leaks in tests and production

The runtime.NumGoroutine() count is the cheap canary. Sample it before and after a test; non-zero delta = leak. The goleak package automates this:

import "go.uber.org/goleak"
 
func TestMain(m *testing.M) {
    goleak.VerifyTestMain(m)
}
 
// per-test guard — fails the test if any goroutine outlives this function
func TestProcessOrder(t *testing.T) {
    defer goleak.VerifyNone(t)
    // ... test body
}

In production, client_golang's default Go collector already exports go_goroutines for free — scrape that. To expose the count yourself (e.g. a custom registry without that collector), use a GaugeFunc with a name that can't collide with the built-in:

var goroutines = promauto.NewGaugeFunc(
    prometheus.GaugeOpts{Name: "app_goroutines_live"},
    func() float64 { return float64(runtime.NumGoroutine()) },
)

Alert when the goroutine count grows monotonically over an hour — that pattern is almost always a forgotten defer cancel() or a goroutine waiting on a channel that nobody closes.

Cancellation propagation rules

A few rules that catch >90% of real-world bugs: [Go context]

  1. Always pass ctx as the first parameter of any function that does I/O, blocks on a channel, or calls another ctx-aware function.
  2. Never wrap ctx inside another struct with the goal of "saving keystrokes." The containedctx linter flags a context.Context held in a struct field. (staticcheck SA1029 is a different check — it flags built-in types used as WithValue keys, the rule behind the unexported-key-type advice above.)
  3. select { case <-ctx.Done(): } everywhere that you would otherwise block. A for loop reading from a channel becomes:
for {
    select {
    case <-ctx.Done():
        return ctx.Err()
    case msg := <-msgs:
        if err := process(ctx, msg); err != nil {
            return err
        }
    }
}
  1. Cancel-on-parent: WithCancel(parent) cancels when parent cancels — wire your tree so one root cancellation drains the whole stack.
  2. Don't recreate roots inside the request path. context.Background() belongs in main() and a few cron entry points; everything downstream derives from the request ctx.

What ctx.Err() returns and why it matters

After <-ctx.Done() fires, call ctx.Err() to know why the context cancelled:

Return valueTriggerHTTP mappinggRPC mapping
nilContext still live — Done() not yet closed (a closed Done() always yields a non-nil Err())n/an/a
context.CanceledCaller invoked cancel() or parent canceled499 (client closed request)codes.Canceled
context.DeadlineExceededDeadline reached (WithTimeout, WithDeadline)504 (gateway timeout)codes.DeadlineExceeded

The HTTP 499 status code is non-standard but supported by Nginx and most observability tools[Go context]. Use it to distinguish "client gave up" (499) from "we ran out of budget" (504) — both metrics matter for SLO tracking, and conflating them masks real availability problems.

For Go 1.20+, WithCancelCause(parent) and context.Cause(ctx) let you attach an error to the cancellation, which propagates through the chain. Use it sparingly — most code paths only care about the canceled-vs-timed-out distinction above.

Pick the right constructor for the symptom

When a context bug bites in production, the question is "what kind of cancellation do I want." Route by intent, not by API name:

graph TD
    Need[I need to control<br/>a downstream operation] --> Q{Why am I<br/>cancelling?}
    Q -->|Caller might give up| Cancel[WithCancel<br/>+ defer cancel<br/>good for fan-out]
    Q -->|Operation has SLA| Timeout[WithTimeout<br/>+ defer cancel<br/>500ms RPC budget]
    Q -->|Need budget split<br/>across multiple steps| Deadline[WithDeadline<br/>+ defer cancel<br/>parent budget minus elapsed]
    Q -->|Carrying request data<br/>not cancellation| Value[WithValue<br/>request ID, user ID,<br/>trace span]
    Q -->|Multiple cancellation<br/>causes| Cause[WithCancelCause<br/>Go 1.20 plus<br/>read with context.Cause]
    Cancel --> Always[Always call cancel<br/>even on success path]
    Timeout --> Always
    Deadline --> Always
    Cause --> Always
    Value -.->|cancel propagates| Cancel
    style Always fill:#dfd
    style Value fill:#ffd

The diagram is the cheat-sheet for context bugs: every "why is my goroutine leaking" trace lands on a missing defer cancel() or a stored-in-struct context that outlived its request[Go context].

Production gotchas the linter will not catch

Static analysis flags the obvious leaks. The bugs that survive code review are subtler: a value lookup that quietly returns the zero value, a parent context replaced halfway through a request, a select that races with a closed channel. The next three patterns show up in real incidents and are worth memorising.

Type-assert with the comma-ok form, every time

The single-return form of a context value lookup will compile, run, and silently hand you the zero value when the key is absent or stored under a different type. The compiler cannot help you here because the return type of the lookup is the empty interface. The fix is the comma-ok assertion plus an explicit fallback path so a missing trace identifier becomes an obvious log line, not a hidden zero string in three downstream services.

type ctxKey int
 
const (
    requestIDKey ctxKey = iota
    tenantIDKey
)
 
// Bad: the assertion silently drops to "" if the key is missing or
// the value is the wrong type. Downstream logs show empty strings.
func badLookup(ctx context.Context) string {
    return ctx.Value(requestIDKey).(string)
}
 
// Good: explicit comma-ok branch and a sentinel so a missing value
// is loud, not silent. Production telemetry can alert on the sentinel.
func RequestID(ctx context.Context) (string, bool) {
    id, ok := ctx.Value(requestIDKey).(string)
    if !ok || id == "" {
        return "", false
    }
    return id, true
}
 
func handleOrder(ctx context.Context, orderID string) error {
    rid, ok := RequestID(ctx)
    if !ok {
        // Loud failure rather than logs full of empty request IDs.
        return fmt.Errorf("missing request id in ctx for order %s", orderID)
    }
    log.Printf("rid=%s order=%s", rid, orderID)
    return nil
}

The same rule applies to tenant identifiers, user identities, locale tags, and feature-flag bundles. Any helper that hides the bool return value behind a single string will eventually mask a wiring bug across a service boundary.

Wire the signal handler into the cancel chain, not next to it

A common shutdown bug is two parallel cancellation paths that never meet. The HTTP server takes one context, the signal handler takes another, and the worker pool reads from a third. When SIGTERM fires, the signal handler returns from main while in-flight handlers keep going on the original Background() and are killed mid-write by the runtime. The fix is to root every long-lived goroutine in a single signal.NotifyContext so one ctrl-C drains the entire tree.

func main() {
    ctx, stop := signal.NotifyContext(context.Background(),
        os.Interrupt, syscall.SIGTERM)
    defer stop()
 
    srv := &http.Server{
        Addr:    ":8080",
        Handler: router(),
        // BaseContext returns the parent for every incoming request,
        // so SIGTERM cancels in-flight handler ctxs as well.
        BaseContext: func(_ net.Listener) context.Context { return ctx },
    }
 
    go func() {
        if err := srv.ListenAndServe(); err != nil &&
            !errors.Is(err, http.ErrServerClosed) {
            log.Fatalf("listen: %v", err)
        }
    }()
 
    <-ctx.Done()
    shutdownCtx, cancel := context.WithTimeout(
        context.Background(), 25*time.Second)
    defer cancel()
    if err := srv.Shutdown(shutdownCtx); err != nil {
        log.Printf("graceful shutdown failed: %v", err)
    }
}

signal.NotifyContext was added in Go 1.16 and is the only correct entry point for production servers. Pre-1.16 code that uses a manual signal.Notify channel plus a separate cancelFunc is the most common source of orphaned goroutines on shutdown — the channel fires, the handler exits, but no descendant context ever sees Done() close.

Propagate ctx through OpenTelemetry, not around it

Trace propagation needs the same context that cancels the request. Two patterns leak in production: starting a span with context.Background() so the trace stays disconnected from the parent, and storing a span pointer outside the context so cancellation cannot reach the export pipeline. The otelhttp middleware does the first half automatically; the second half is a discipline of always reading the span from the request context, not a struct field.

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/trace"
    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
)
 
func wireRouter(h http.Handler) http.Handler {
    return otelhttp.NewHandler(h, "http.server")
}
 
func handleCheckout(w http.ResponseWriter, r *http.Request) {
    ctx := r.Context()
    tracer := otel.Tracer("checkout")
    ctx, span := tracer.Start(ctx, "checkout.validate")
    defer span.End()
 
    // Pass the derived ctx everywhere — never start a child span
    // from Background() or you orphan the trace.
    if err := validate(ctx, r); err != nil {
        span.RecordError(err)
        http.Error(w, "invalid", http.StatusBadRequest)
        return
    }
    if err := charge(ctx, r); err != nil {
        span.RecordError(err)
        http.Error(w, "payment failed", http.StatusBadGateway)
        return
    }
    w.WriteHeader(http.StatusOK)
}

Two follow-on rules. First, never call trace.SpanFromContext and then store the result in a struct that outlives the request — once the request context cancels, the span is closed and writes silently no-op. Second, when crossing into a goroutine that survives the request (a worker that sends a confirmation email asynchronously, for example), build a fresh root context from Background(), attach a span link with trace.WithLinks, and let the original request context die. That preserves the trace topology without binding the worker to the inbound HTTP deadline.

Choosing context.Context vs sync.WaitGroup

Both primitives coordinate goroutines, but they answer different questions. WaitGroup answers "is everyone done?" and is purely a counter. Context answers "should you stop?" and is purely a cancellation signal. Real services need both, and the rule of thumb is: use context to broadcast cancel, use WaitGroup to wait for clean exit.

QuestionReach forWhy
When should this goroutine give up?context.ContextCarries deadline, cancel signal, and request values
How do I know all spawned goroutines drained?sync.WaitGroupCounts up on Add, blocks on Wait until all Done
One goroutine errors — do the rest abort?errgroup.GroupWraps WaitGroup with first-error short-circuit and shared ctx
Bound concurrency to N workerserrgroup.SetLimit or buffered channelContext alone has no concurrency limit
Long-running fan-out with per-task deadlineContext + errgroup per taskParent ctx for global cancel, child ctx per task budget

The most common confusion is using a WaitGroup to "wait for cancel," which it cannot do, or using only context for fan-in, which leaks because no goroutine actually reports completion. The errgroup.WithContext constructor merges both into a single API and is the right default for new fan-out code.

import "golang.org/x/sync/errgroup"
 
func fetchAll(parent context.Context, urls []string) ([][]byte, error) {
    g, ctx := errgroup.WithContext(parent)
    g.SetLimit(8) // bounded concurrency
 
    out := make([][]byte, len(urls))
    for i, u := range urls {
        i, u := i, u // pin for closure
        g.Go(func() error {
            // Inherit the group's ctx so the first error cancels siblings.
            req, _ := http.NewRequestWithContext(ctx, http.MethodGet, u, nil)
            resp, err := http.DefaultClient.Do(req)
            if err != nil {
                return fmt.Errorf("fetch %s: %w", u, err)
            }
            defer resp.Body.Close()
            body, err := io.ReadAll(resp.Body)
            if err != nil {
                return err
            }
            out[i] = body
            return nil
        })
    }
    if err := g.Wait(); err != nil {
        return nil, err
    }
    return out, nil
}

The errgroup.WithContext ctx cancels automatically when the first goroutine returns a non-nil error. Sibling goroutines see ctx.Done() close and abort their HTTP calls cleanly. Without it you would have to write a manual WaitGroup, a shared context.WithCancel, and a sync.Once around the error capture — every line of that machinery is a place for a leak to hide.

Incident retrospective: the ten-minute payment outage

A real outage from a payments gateway, redacted. The symptom was a sudden p99 latency spike on /charge from 180 ms to 9 s, followed by a wave of 504s from the upstream load balancer. CPU was flat, memory was rising, and the goroutine count climbed from 2 thousand to 140 thousand in nine minutes before the on-call rolled back.

The root cause was a four-line change to a fraud-check helper. The previous version derived a five-second timeout from the request context. The change moved the timeout into a "shared" helper that took its own context root for "cleanliness," intending to prevent fraud checks from being killed when clients gave up. What the author missed: the helper now had no upstream cancel signal, so when the downstream fraud service stalled at three seconds per call, every in-flight request piled goroutines up forever, each holding an open Postgres transaction.

Three things made the incident recoverable. First, the goroutine gauge was already on the on-call dashboard and tripped the rate-of-change alert at 90 seconds. Second, the rollback path was a single revert because the change touched one file. Third, the post-incident test added a goleak.VerifyTestMain to the fraud package, which would have caught the orphaned context in CI before merge.

The four lessons that went into the team runbook:

  1. Helpers that own their own context root are forbidden in the request path. Every helper takes the caller's ctx and derives a child with WithTimeout only.
  2. The goroutine count gauge with a rate-of-change alert is the cheapest leading indicator of a context leak. Add it to every Go service before adding any other custom metric.
  3. Every package that spawns goroutines runs goleak.VerifyTestMain in tests. The two-line addition catches at least one bug per quarter on a busy codebase.
  4. Code review checklist gains one bullet: any new context.Background() outside main, tests, or named cron entry points needs an explicit comment justifying it. No comment, no merge.

The incident took ten minutes to detect, two minutes to roll back, and the postmortem took an hour. The cost of the missing defer cancel() discipline is paid in those ten minutes every time it ships.

Frequently Asked Questions

When should I call cancel() if I'm deriving a context in a loop?

Call cancel() immediately after each step finishes, before the next iteration. Do not defer once per loop — it holds all cancels until the loop exits.

for _, url := range urls {
    ctx, cancel := context.WithTimeout(parent, 1*time.Second)
    body, err := fetch(ctx, url)
    cancel() // release immediately
    if err != nil { return err }
}

Is context.TODO() safe to ship?

No. go vet will not flag it, but it's a signal to reviewers that a real context was not threaded. Replace it before merging to main.

Why shouldn't I store context in a struct like Handler.ctx?

Because the context is tied to a single request's lifetime. If you cache it in a struct that lives longer (like a handler instance), it will be cancelled after the first request, and all subsequent requests will use a dead context.

What's the difference between DeadlineExceeded and Canceled?

DeadlineExceeded means the deadline passed (timeout). Canceled means someone called cancel() or the parent context was canceled. Map them to different HTTP status codes: 504 and 499, respectively.

Constructor reference table

A quick comparison of the five constructors and when to reach for each:

ConstructorWhen to useCancellation sourceMust defer cancel()?
Background()Top of main, server startupNeverNo
TODO()Placeholder during refactorNeverNo
WithCancel(parent)Caller might give up earlyExplicit cancel() callYes
WithTimeout(parent, d)RPC budget, request SLATimer expiry or cancel()Yes
WithDeadline(parent, t)Splitting a parent budgetDeadline reached or cancel()Yes
WithValue(parent, k, v)Carrying request-scoped dataInherits parentNo
WithCancelCause(parent)Need cause attribution (Go 1.20+)cancel(err) with causeYes

Keep Reading

BackendBytes Engineering Team
BackendBytes Engineering Team

Engineering Team

Backend engineers writing production-grade references for Go, Java, and distributed systems.

Read Next