#go #production #reliability #kubernetes

Go Graceful HTTP Shutdown: Zero-Downtime Production Patterns

Q: How does Go's http.Server.Shutdown work?

Shutdown stops the server from accepting new connections, waits for all in-flight requests to complete (up to the context deadline), then returns. It does not interrupt active requests — it lets them finish gracefully.

Q: Why do I get 502 errors during Kubernetes rolling updates?

Kubernetes removes the pod from the Service endpoints asynchronously after sending SIGTERM. Requests can arrive after SIGTERM but before the load balancer updates. Fix this by marking the health endpoint unhealthy and adding a short delay before calling Shutdown.

Q: What signal does Kubernetes send to stop a pod?

Kubernetes sends SIGTERM first and waits for terminationGracePeriodSeconds (default 30s). If the process is still running after that period, it sends SIGKILL which cannot be caught or handled.

Q: How do you handle background goroutines during Go server shutdown?

Use sync.WaitGroup to track active background goroutines (queue consumers, cron jobs). On SIGTERM, cancel their context, then call wg.Wait() after srv.Shutdown() to ensure all background work completes before the process exits.

BackendBytes Engineering Team

Mar 2, 2026

13 min read

Go Graceful HTTP Shutdown: Zero-Downtime Production Patterns

Key Takeaways

→Kubernetes removes pods from endpoints asynchronously after SIGTERM — requests arrive after shutdown begins, causing 502s
→Two-phase shutdown: mark readiness unhealthy first (drains LB), then call Server.Shutdown
→Use sync.WaitGroup to wait for background goroutines before exit; Server.Shutdown doesn't

Your Server.Shutdown works perfectly — and every rolling update still fires a burst of 502s for 5 seconds. Kubernetes removes pods from Service endpoints asynchronously after sending SIGTERM, so requests keep arriving at pods that have already closed their listeners. We debugged this exact race on multiple production Go services, and the fix is always the same: a two-phase shutdown that marks readiness unhealthy 5 seconds before closing the listener.

Server.Shutdown^{[Go Language Specification]} stops accepting new connections and waits for in-flight requests to finish. But it doesn't wait for background goroutines — queue consumers, cron jobs, cache warmers keep running. And it doesn't solve the Kubernetes timing race: if you call Shutdown immediately on SIGTERM, in-flight requests will get rejected. The fix is a two-phase shutdown detailed in the lifecycle sequence below:

sequenceDiagram
    participant K as Kubernetes
    participant LB as Service / LB
    participant App as Go process
    participant Bg as Background workers
    K->>App: SIGTERM
    Note over App: Phase 1: signal readiness=unhealthy
    App->>LB: /readyz returns 503
    LB-->>K: drop endpoint asynchronously<br/>eventual, ~seconds
    Note over App: Phase 1 sleep ~5s<br/>ensures LB has dropped us
    Note over App: Phase 2 — Server.Shutdown ctx, 25s
    App->>App: stop accepting new conns<br/>wait for in-flight to finish
    App->>Bg: cancel ctx
    Bg-->>App: workers drain (WaitGroup)
    App->>K: process exit 0
    Note over K: K8s sends SIGKILL<br/>after grace period, 30s default

The diagram is the timing-race lesson in one picture: the LB drop is asynchronous, so calling Server.Shutdown immediately on SIGTERM means in-flight requests sent during the LB's drop window land on a closed listener. The 5-second prelude is the cheapest fix; the WaitGroup tail is the part Server.Shutdown won't do for you.

The Two-Phase Shutdown Timeline

The shutdown sequence is timing-sensitive: Kubernetes removes pods from Service endpoints asynchronously, so the readiness probe must flip unhealthy before the HTTP listener closes:

sequenceDiagram
    participant K8s as Kubernetes
    participant Pod as Pod (Go server)
    participant LB as Service / kube-proxy
    participant Client
    Note over K8s,Client: t=0 — pod is healthy, serving traffic
    Client->>LB: GET /work
    LB->>Pod: forward
    Pod-->>LB: 200 OK
    LB-->>Client: 200 OK
    Note over K8s,Pod: t=0 — rolling update starts
    K8s->>Pod: SIGTERM
    Note over Pod: PHASE 1<br/>Mark readiness UNHEALTHY<br/>Listener still open
    Pod->>Pod: readiness=false
    Note over LB,Pod: K8s removes pod from endpoints<br/>(propagates async, ~1-3s)
    Note over Pod: Sleep 5 seconds<br/>so existing requests drain
    Client->>LB: GET /work (in-flight)
    LB->>Pod: forward (still routed)
    Pod-->>LB: 200 OK
    Note over Pod: t=5s — PHASE 2<br/>Server.Shutdown(25s)
    Pod->>Pod: stop Accept loop
    Pod->>Pod: wait for in-flight requests
    Pod->>Pod: wg.Wait() for background workers
    Note over Pod: t=30s — clean exit<br/>before K8s SIGKILL at t=30s

The diagram is the entire production-shutdown discipline in one picture^{[Kubernetes docs]}: never close the listener before draining the LB; never Shutdown without a deadline; never forget background workers.

Shutdown Approach Decision Table

Choose your shutdown strategy based on deployment environment and constraints:

Scenario	Approach	Key Pattern	Timing
Kubernetes with rolling updates	Two-phase: mark unhealthy, wait 5s, shutdown	`SetNotReady()` + 5s sleep + `Server.Shutdown(25s)`	30-35s total
Standalone or VM-based	Direct shutdown on SIGTERM	`Server.Shutdown(30s)`	30s
Long-lived connections (WebSocket/gRPC)	Track hijacked connections separately	`ConnTracker.CloseAll()` after HTTP shutdown	Variable
Multiple background workers	Coordinate with WaitGroup	HTTP shutdown + `workers.Wait()` + timeout	35-40s

Signal handling and health probes

^{[Go net/http]}

The core pattern uses signal.NotifyContext (Go 1.16+) to catch SIGTERM and separate readiness/liveness probes:

package main
 
import (
	"context"
	"errors"
	"log/slog"
	"net/http"
	"os"
	"os/signal"
	"sync/atomic"
	"syscall"
	"time"
)
 
type HealthChecker struct {
	ready atomic.Bool
}
 
func (h *HealthChecker) SetNotReady() { h.ready.Store(false) }
 
func (h *HealthChecker) Readiness(w http.ResponseWriter, r *http.Request) {
	if !h.ready.Load() {
		w.WriteHeader(http.StatusServiceUnavailable)
		return
	}
	w.WriteHeader(http.StatusOK)
}
 
func (h *HealthChecker) Liveness(w http.ResponseWriter, r *http.Request) {
	// Always return 200 — never restart during shutdown
	w.WriteHeader(http.StatusOK)
}
 
func run(ctx context.Context) error {
	ctx, stop := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
	defer stop()
 
	health := &HealthChecker{}
	health.ready.Store(true)
 
	mux := http.NewServeMux()
	mux.HandleFunc("GET /healthz", health.Liveness)
	mux.HandleFunc("GET /readyz", health.Readiness)
	mux.HandleFunc("GET /", func(w http.ResponseWriter, r *http.Request) {
		time.Sleep(100 * time.Millisecond)
		w.WriteHeader(http.StatusOK)
	})
 
	srv := &http.Server{
		Addr:        ":8080",
		Handler:     mux,
		IdleTimeout: 60 * time.Second,
	}
 
	// Start server
	errCh := make(chan error, 1)
	go func() {
		if err := srv.ListenAndServe(); !errors.Is(err, http.ErrServerClosed) {
			errCh <- err
		}
	}()
 
	select {
	case err := <-errCh:
		return err
	case <-ctx.Done():
		// Phase 1: mark not-ready, wait for LB to drain
		health.SetNotReady()
		time.Sleep(5 * time.Second)
 
		// Phase 2: shutdown HTTP server
		shutdownCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
		defer cancel()
 
		return srv.Shutdown(shutdownCtx)
	}
}
 
func main() {
	if err := run(context.Background()); err != nil {
		slog.Error("fatal", "err", err)
		os.Exit(1)
	}
}

The 5-second wait between SetNotReady() and Shutdown() lets Kubernetes endpoints propagate. The 25-second shutdown timeout fits within the default 30-second terminationGracePeriodSeconds (5s + 25s = 30s).

Separate /healthz (liveness) and /readyz (readiness) probes: liveness returns 200 always (prevents cascading restarts during shutdown), readiness returns 503 once SetNotReady() is called (removes the pod from the load balancer). This distinction is critical. If you return 503 on liveness, Kubernetes kills the pod (restarts it) instead of draining it. If you don't mark readiness unhealthy, the load balancer keeps sending requests after SIGTERM fires.

ErrServerClosed Is Not an Error

ListenAndServe returns http.ErrServerClosed when Shutdown is called — this is expected behavior. Always check with errors.Is(err, http.ErrServerClosed) before treating it as an error.

Background workers: queues, cron, and goroutines

^{[Go context]}

Server.Shutdown waits for in-flight HTTP requests but not background goroutines. Use sync.WaitGroup to track and drain queue consumers, cron jobs, and other long-lived workers:

type App struct {
	srv     *http.Server
	health  *HealthChecker
	workers sync.WaitGroup
}
 
func (a *App) StartWorker(ctx context.Context, fn func(ctx context.Context)) {
	a.workers.Add(1)
	go func() {
		defer a.workers.Done()
		fn(ctx)
	}()
}
 
func (a *App) Shutdown(ctx context.Context) error {
	a.health.SetNotReady()
	time.Sleep(5 * time.Second)
 
	if err := a.srv.Shutdown(ctx); err != nil {
		return err
	}
 
	// Wait for background workers
	done := make(chan struct{})
	go func() {
		a.workers.Wait()
		close(done)
	}()
 
	select {
	case <-done:
		return nil
	case <-ctx.Done():
		return fmt.Errorf("workers timeout: %w", ctx.Err())
	}
}

Workers must check ctx.Done() in their loops to exit cleanly:

func consumer(ctx context.Context) {
	for {
		select {
		case <-ctx.Done():
			return
		default:
		}
		msg, err := queue.Receive(ctx, 10*time.Second)
		if err != nil {
			if ctx.Err() != nil {
				return
			}
			continue
		}
		process(ctx, msg)
	}
}

Shutdown order is critical: HTTP server stops accepting requests → load balancer removes the pod from endpoints → background workers finish their in-flight work → databases and external connections close. Close databases after workers.Wait(), not before, otherwise workers will fail with "connection closed" errors on final operations. This ordering ensures no resource is torn down while a goroutine still depends on it.

A common pattern is to pass the signal context to all workers at startup. When the main goroutine calls <-ctx.Done(), that same context cancels for all workers. Workers that check ctx.Done() in their loops exit immediately, allowing workers.Wait() to complete.

Kubernetes configuration and timing

^{[Kubernetes docs]}

Set terminationGracePeriodSeconds to at least 35 seconds (5s drain + 25s shutdown + 5s buffer):

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 35
      containers:
        - name: api
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            periodSeconds: 10
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8080
            periodSeconds: 5
            failureThreshold: 1
          lifecycle:
            preStop:
              exec:
                command: ["sleep", "5"]

The preStop hook adds an extra 5-second buffer before SIGTERM, giving the control plane time to propagate the pod removal before your application code runs. When combined with the 5-second in-app wait, this covers most endpoint propagation delays. The readinessProbe with failureThreshold: 1 removes the pod from the load balancer immediately on the first failed readiness check. The livenessProbe with failureThreshold: 3 gives the app a few chances to respond before the kubelet restarts it — this prevents false positives during high load. During shutdown, the liveness probe never fails (it always returns 200), so the pod is never restarted, only drained.

terminationGracePeriodSeconds Must Exceed Your Shutdown Budget

If terminationGracePeriodSeconds is shorter than (preStop + in-app drain + Server.Shutdown timeout), Kubernetes sends SIGKILL and connections drop. The formula: preStop (5s) + drain (5s) + shutdown (25s) = 35s total. Set terminationGracePeriodSeconds to 35s or higher.

Connection timeouts and long-lived connections

^{[RFC 9110, 2022]}

Set IdleTimeout on the HTTP server (this example uses 60 seconds). Server.Shutdown waits for active requests but closes idle connections immediately. For WebSocket or gRPC streaming connections, manually track and close hijacked connections:

type ConnTracker struct {
	mu    sync.Mutex
	conns map[net.Conn]struct{}
}
 
func (t *ConnTracker) Add(conn net.Conn) {
	t.mu.Lock()
	t.conns[conn] = struct{}{}
	t.mu.Unlock()
}
 
func (t *ConnTracker) CloseAll() {
	t.mu.Lock()
	defer t.mu.Unlock()
	for conn := range t.conns {
		conn.Close()
	}
}

Call tracker.CloseAll() after srv.Shutdown() returns.

Production checklist

Kubernetes manifest that survives a rolling deploy

The pattern below is what we run on services with sustained 1k+ RPS — preStop hook absorbs the endpoint-propagation race, terminationGracePeriodSeconds gives the in-app drain a real budget, and the readiness/liveness split prevents the kubelet from killing a draining pod prematurely:

# deployment.yaml — production drain settings
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 40   # preStop(5) + drain(5) + Server.Shutdown(25) + 5s headroom
      containers:
        - name: api
          # When this returns failure, the pod is removed from Service endpoints.
          # Keep checking the dependency tree; the pod IS still serving in-flight
          # requests, just not accepting new ones via the LB.
          readinessProbe:
            httpGet: { path: /readyz, port: 8080 }
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 2
 
          # Liveness only checks the process is alive — never the dependencies.
          # Otherwise a flaky downstream restarts your healthy pod and amplifies
          # the outage instead of containing it.
          livenessProbe:
            httpGet: { path: /healthz, port: 8080 }
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 6
 
          lifecycle:
            preStop:
              exec:
                # Sleep > kube-proxy iptables propagation lag (typically 1-3s).
                # Critical: this gives the LB time to remove the pod from the
                # rotation BEFORE the app starts refusing new connections.
                command: ["/bin/sh", "-c", "sleep 5"]

Pair the manifest with a vegeta load test that proves zero 5xx during a rolling deploy — paste this into your CI matrix so a regression in shutdown handling shows up as a build failure, not a 3am page:

# Run a constant-rate load against the service for 60s, kicking off a
# kubectl rollout halfway through. Any non-2xx during the rollout is a bug.
echo "GET https://api.example.com/healthz" | \
  vegeta attack -rate=200 -duration=60s -timeout=2s | \
  tee /tmp/results.bin > /dev/null &
VEGETA_PID=$!
 
sleep 25
kubectl rollout restart deployment/api
wait "$VEGETA_PID"
 
vegeta report -type=text < /tmp/results.bin
# Acceptable: success rate >= 99.99%, P99 < 250ms, zero 5xx.

The two settings teams forget that cause "I see 502s during deploy" tickets:

# Set on the LB / Ingress so the GOAWAY frame from a draining pod is honoured.
# AWS ALB defaults are usually fine; nginx ingress needs the proxy_next_upstream
# block below or it'll return the pod's 502 to the client instead of retrying.
nginx.ingress.kubernetes.io/proxy-next-upstream: "error timeout http_502 http_503"
nginx.ingress.kubernetes.io/proxy-next-upstream-tries: "3"

// And on the Go side — register the SIGTERM handler BEFORE the listener
// starts. A common mistake is registering it after, leaving a millisecond
// window where SIGTERM kills the process during startup.
import (
    "context"
    "errors"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)
 
func main() {
    sigCh := make(chan os.Signal, 1)
    signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGINT)
 
    srv := &http.Server{Addr: ":8080", Handler: router}
    go func() {
        if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
            log.Fatal(err)
        }
    }()
 
    <-sigCh
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    _ = srv.Shutdown(ctx)
}

Draining goroutines without leaking work

The trickiest shutdowns are not the HTTP layer — they are the long-tail goroutines that an HTTP server never tracks. Cache refreshers, log flushers, prefetchers, anomaly scanners, and queue consumers all live outside the request lifecycle, and they are the most common cause of post-deploy data loss. The pattern that actually works in production is one cancellation context fanned out to every worker, paired with a WaitGroup that the shutdown path blocks on with a hard ceiling. Without the ceiling, a single hung worker pins the pod until SIGKILL fires, so any guarantee you thought you had about clean exit silently downgrades to a forced kill.

type Drainer struct {
	wg     sync.WaitGroup
	cancel context.CancelFunc
	ctx    context.Context
}
 
func NewDrainer(parent context.Context) *Drainer {
	ctx, cancel := context.WithCancel(parent)
	return &Drainer{ctx: ctx, cancel: cancel}
}
 
func (d *Drainer) Go(name string, fn func(context.Context) error) {
	d.wg.Add(1)
	go func() {
		defer d.wg.Done()
		if err := fn(d.ctx); err != nil && !errors.Is(err, context.Canceled) {
			slog.Error("worker exited", "name", name, "err", err)
		}
	}()
}
 
func (d *Drainer) Shutdown(timeout time.Duration) error {
	d.cancel()
	done := make(chan struct{})
	go func() { d.wg.Wait(); close(done) }()
	select {
	case <-done:
		return nil
	case <-time.After(timeout):
		return fmt.Errorf("drain exceeded %s; forced exit", timeout)
	}
}

The Drainer is the contract: every background goroutine receives the same parent context, every worker exits when that context cancels, and the deadline ensures one buggy task cannot indefinitely block the deploy.

Background jobs that must survive SIGTERM

Some workloads do not want a hard cap. A long-running batch reconciliation, a checkpointed migration, or a half-written upload to object storage cannot be safely interrupted — but the HTTP layer still needs to drain in seconds. The pattern is to split the lifecycle: the HTTP server obeys the standard 25-second budget, while protected jobs persist their checkpoint, then either complete inline or hand off via a durable queue so a freshly scheduled pod can resume. Mark these jobs explicitly so reviewers see why they bypass the normal drain budget.

type CheckpointedJob struct {
	store    Store
	jobID    string
	finished atomic.Bool
}
 
func (j *CheckpointedJob) Run(ctx context.Context) error {
	for {
		select {
		case <-ctx.Done():
			return j.store.SaveCheckpoint(j.jobID, j.snapshot())
		default:
		}
		batch, done, err := j.processNext(ctx)
		if err != nil {
			return err
		}
		if done {
			j.finished.Store(true)
			return j.store.MarkComplete(j.jobID)
		}
		if err := j.store.SaveCheckpoint(j.jobID, batch); err != nil {
			return err
		}
	}
}

The discipline is to checkpoint before every external side effect, not at the end. A pod can disappear mid-iteration; the next pod must read the last persisted checkpoint and resume without re-emitting work.

gRPC GracefulStop versus HTTP Shutdown

gRPC services need a parallel sequence with one important difference: grpc.Server.GracefulStop blocks until all RPCs (including streaming ones) finish. There is no built-in deadline, so wrap it in a timeout select and fall back to Stop if streams refuse to drain. The shape mirrors HTTP shutdown but the failure mode is different — a hung bidirectional stream will pin GracefulStop forever unless you escalate.

func shutdownGRPC(ctx context.Context, srv *grpc.Server) error {
	done := make(chan struct{})
	go func() { srv.GracefulStop(); close(done) }()
	select {
	case <-done:
		return nil
	case <-ctx.Done():
		srv.Stop() // hard cancel: aborts streams, releases the listener
		return ctx.Err()
	}
}

When a service exposes both HTTP and gRPC on separate ports, run the readiness flip first, then trigger both shutdowns concurrently with a shared deadline. Aggregate the errors rather than short-circuiting — if HTTP drains cleanly but gRPC times out, you still want the stack trace from the streaming handler that refused to release.

Frequently Asked Questions

How does Go's http.Server.Shutdown work?

Shutdown stops the server from accepting new connections, waits for all in-flight requests to complete (up to the context deadline), then returns. It does not interrupt active requests — it lets them finish gracefully.

Why do I get 502 errors during Kubernetes rolling updates?

Kubernetes removes the pod from the Service endpoints asynchronously after sending SIGTERM. Requests can arrive after SIGTERM but before the load balancer updates. Fix this by marking the health endpoint unhealthy and adding a short delay before calling Shutdown.

What signal does Kubernetes send to stop a pod?

Kubernetes sends SIGTERM first and waits for terminationGracePeriodSeconds (default 30s). If the process is still running after that period, it sends SIGKILL which cannot be caught or handled.

How do you handle background goroutines during Go server shutdown?

Use sync.WaitGroup to track active background goroutines (queue consumers, cron jobs). On SIGTERM, cancel their context, then call wg.Wait() after srv.Shutdown() to ensure all background work completes before the process exits.

Keep Reading

Go Context in Depth: Cancellation, Timeouts, and Debugging in Production — The context propagation model that powers graceful shutdown, from root cancellation to goroutine cleanup
Production-Grade Go API Design — Health probes, middleware chains, and structured error handling for the HTTP layer that sits above your shutdown logic
Kubernetes Networking Deep Dive — Understanding endpoint propagation, service discovery, and the networking layer that enforces graceful drain

Was this article helpful?

Your feedback directly shapes our editorial depth and technical accuracy.