Go Graceful HTTP Shutdown: Zero-Downtime Production Patterns
Key Takeaways
- →Kubernetes removes pods from endpoints asynchronously after SIGTERM — requests arrive after shutdown begins, causing 502s
- →Two-phase shutdown: mark readiness unhealthy first (drains LB), then call Server.Shutdown
- →Use sync.WaitGroup to wait for background goroutines before exit; Server.Shutdown doesn't
The classic Go-on-Kubernetes production rollout incident. A rolling update generates a burst of 502 errors over 5 seconds — not because
Server.Shutdownis broken but because Kubernetes removes pods from Service endpoints asynchronously after sending SIGTERM. Requests keep arriving at pods that have already closed their listeners. We debugged this exact race on multiple production Go services and the fix is always the same: a two-phase shutdown that marks readiness unhealthy 5 seconds before closing the listener.
Server.Shutdown[Go Language Specification] stops accepting new connections and waits for in-flight requests to finish. But it doesn't wait for background goroutines — queue consumers, cron jobs, cache warmers keep running. And it doesn't solve the Kubernetes timing race: if you call Shutdown immediately on SIGTERM, in-flight requests will get rejected.
The fix is a two-phase shutdown: first mark the readiness probe unhealthy (so Kubernetes removes the pod from the load balancer), wait 5 seconds, then call Server.Shutdown with a 25-second timeout. Use sync.WaitGroup to coordinate background goroutines that Server.Shutdown doesn't track.
- Mark unhealthy before Shutdown; this drains the LB before connections close
- Timeout
Server.Shutdownto 25s with a 5-second prelude = 30-second Kubernetes grace period - Wait for background workers with WaitGroup after HTTP server stops
sequenceDiagram
participant K as Kubernetes
participant LB as Service / LB
participant App as Go process
participant Bg as Background workers
K->>App: SIGTERM
Note over App: Phase 1: signal readiness=unhealthy
App->>LB: /readyz returns 503
LB-->>K: drop endpoint asynchronously<br/>eventual, ~seconds
Note over App: Phase 1 sleep ~5s<br/>ensures LB has dropped us
Note over App: Phase 2 — Server.Shutdown ctx, 25s
App->>App: stop accepting new conns<br/>wait for in-flight to finish
App->>Bg: cancel ctx
Bg-->>App: workers drain (WaitGroup)
App->>K: process exit 0
Note over K: K8s sends SIGKILL<br/>after grace period, 30s default
The diagram is the timing-race lesson in one picture: the LB drop is asynchronous, so calling Server.Shutdown immediately on SIGTERM means in-flight requests sent during the LB's drop window land on a closed listener. The 5-second prelude is the cheapest fix; the WaitGroup tail is the part Server.Shutdown won't do for you.
The Two-Phase Shutdown Timeline
The shutdown sequence is timing-sensitive: Kubernetes removes pods from Service endpoints asynchronously, so the readiness probe must flip unhealthy before the HTTP listener closes:
sequenceDiagram
participant K8s as Kubernetes
participant Pod as Pod (Go server)
participant LB as Service / kube-proxy
participant Client
Note over K8s,Client: t=0 — pod is healthy, serving traffic
Client->>LB: GET /work
LB->>Pod: forward
Pod-->>LB: 200 OK
LB-->>Client: 200 OK
Note over K8s,Pod: t=0 — rolling update starts
K8s->>Pod: SIGTERM
Note over Pod: PHASE 1<br/>Mark readiness UNHEALTHY<br/>Listener still open
Pod->>Pod: readiness=false
Note over LB,Pod: K8s removes pod from endpoints<br/>(propagates async, ~1-3s)
Note over Pod: Sleep 5 seconds<br/>so existing requests drain
Client->>LB: GET /work (in-flight)
LB->>Pod: forward (still routed)
Pod-->>LB: 200 OK
Note over Pod: t=5s — PHASE 2<br/>Server.Shutdown(25s)
Pod->>Pod: stop Accept loop
Pod->>Pod: wait for in-flight requests
Pod->>Pod: wg.Wait() for background workers
Note over Pod: t=30s — clean exit<br/>before K8s SIGKILL at t=30s
The diagram is the entire production-shutdown discipline in one picture[Kubernetes docs]: never close the listener before draining the LB; never Shutdown without a deadline; never forget background workers.
Shutdown Approach Decision Table
Choose your shutdown strategy based on deployment environment and constraints:
| Scenario | Approach | Key Pattern | Timing |
|---|---|---|---|
| Kubernetes with rolling updates | Two-phase: mark unhealthy, wait 5s, shutdown | SetNotReady() + 5s sleep + Server.Shutdown(25s) | 30-35s total |
| Standalone or VM-based | Direct shutdown on SIGTERM | Server.Shutdown(30s) | 30s |
| Long-lived connections (WebSocket/gRPC) | Track hijacked connections separately | ConnTracker.CloseAll() after HTTP shutdown | Variable |
| Multiple background workers | Coordinate with WaitGroup | HTTP shutdown + workers.Wait() + timeout | 35-40s |
Signal handling and health probes
[Go net/http]The core pattern uses signal.NotifyContext (Go 1.16+) to catch SIGTERM and separate readiness/liveness probes:
package main
import (
"context"
"errors"
"log/slog"
"net/http"
"os"
"os/signal"
"sync/atomic"
"syscall"
"time"
)
type HealthChecker struct {
ready atomic.Bool
}
func (h *HealthChecker) SetNotReady() { h.ready.Store(false) }
func (h *HealthChecker) Readiness(w http.ResponseWriter, r *http.Request) {
if !h.ready.Load() {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
}
func (h *HealthChecker) Liveness(w http.ResponseWriter, r *http.Request) {
// Always return 200 — never restart during shutdown
w.WriteHeader(http.StatusOK)
}
func run(ctx context.Context) error {
ctx, stop := signal.NotifyContext(ctx, syscall.SIGINT, syscall.SIGTERM)
defer stop()
health := &HealthChecker{}
health.ready.Store(true)
mux := http.NewServeMux()
mux.HandleFunc("GET /healthz", health.Liveness)
mux.HandleFunc("GET /readyz", health.Readiness)
mux.HandleFunc("GET /", func(w http.ResponseWriter, r *http.Request) {
time.Sleep(100 * time.Millisecond)
w.WriteHeader(http.StatusOK)
})
srv := &http.Server{
Addr: ":8080",
Handler: mux,
IdleTimeout: 60 * time.Second,
}
// Start server
errCh := make(chan error, 1)
go func() {
if err := srv.ListenAndServe(); !errors.Is(err, http.ErrServerClosed) {
errCh <- err
}
}()
select {
case err := <-errCh:
return err
case <-ctx.Done():
// Phase 1: mark not-ready, wait for LB to drain
health.SetNotReady()
time.Sleep(5 * time.Second)
// Phase 2: shutdown HTTP server
shutdownCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
defer cancel()
return srv.Shutdown(shutdownCtx)
}
}
func main() {
if err := run(context.Background()); err != nil {
slog.Error("fatal", "err", err)
os.Exit(1)
}
}The 5-second wait between SetNotReady() and Shutdown() lets Kubernetes endpoints propagate. The 25-second shutdown timeout fits within the default 30-second terminationGracePeriodSeconds (5s + 25s = 30s).
Separate /healthz (liveness) and /readyz (readiness) probes: liveness returns 200 always (prevents cascading restarts during shutdown), readiness returns 503 once SetNotReady() is called (removes the pod from the load balancer). This distinction is critical. If you return 503 on liveness, Kubernetes kills the pod (restarts it) instead of draining it. If you don't mark readiness unhealthy, the load balancer keeps sending requests after SIGTERM fires.
ListenAndServe returns http.ErrServerClosed when Shutdown is called — this is expected behavior. Always check with errors.Is(err, http.ErrServerClosed) before treating it as an error.
Background workers: queues, cron, and goroutines
[Go context]Server.Shutdown waits for in-flight HTTP requests but not background goroutines. Use sync.WaitGroup to track and drain queue consumers, cron jobs, and other long-lived workers:
type App struct {
srv *http.Server
health *HealthChecker
workers sync.WaitGroup
}
func (a *App) StartWorker(ctx context.Context, fn func(ctx context.Context)) {
a.workers.Add(1)
go func() {
defer a.workers.Done()
fn(ctx)
}()
}
func (a *App) Shutdown(ctx context.Context) error {
a.health.SetNotReady()
time.Sleep(5 * time.Second)
if err := a.srv.Shutdown(ctx); err != nil {
return err
}
// Wait for background workers
done := make(chan struct{})
go func() {
a.workers.Wait()
close(done)
}()
select {
case <-done:
return nil
case <-ctx.Done():
return fmt.Errorf("workers timeout: %w", ctx.Err())
}
}Workers must check ctx.Done() in their loops to exit cleanly:
func consumer(ctx context.Context) {
for {
select {
case <-ctx.Done():
return
default:
}
msg, err := queue.Receive(ctx, 10*time.Second)
if err != nil {
if ctx.Err() != nil {
return
}
continue
}
process(ctx, msg)
}
}Shutdown order is critical: HTTP server stops accepting requests → load balancer removes the pod from endpoints → background workers finish their in-flight work → databases and external connections close. Close databases after workers.Wait(), not before, otherwise workers will fail with "connection closed" errors on final operations. This ordering ensures no resource is torn down while a goroutine still depends on it.
A common pattern is to pass the signal context to all workers at startup. When the main goroutine calls <-ctx.Done(), that same context cancels for all workers. Workers that check ctx.Done() in their loops exit immediately, allowing workers.Wait() to complete.
Kubernetes configuration and timing
[Kubernetes docs]Set terminationGracePeriodSeconds to at least 35 seconds (5s drain + 25s shutdown + 5s buffer):
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
terminationGracePeriodSeconds: 35
containers:
- name: api
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /readyz
port: 8080
periodSeconds: 5
failureThreshold: 1
lifecycle:
preStop:
exec:
command: ["sleep", "5"]The preStop hook adds an extra 5-second buffer before SIGTERM, giving the control plane time to propagate the pod removal before your application code runs. When combined with the 5-second in-app wait, this covers most endpoint propagation delays. The readinessProbe with failureThreshold: 1 removes the pod from the load balancer immediately on the first failed readiness check. The livenessProbe with failureThreshold: 3 gives the app a few chances to respond before the kubelet restarts it — this prevents false positives during high load. During shutdown, the liveness probe never fails (it always returns 200), so the pod is never restarted, only drained.
If terminationGracePeriodSeconds is shorter than (preStop + in-app drain + Server.Shutdown timeout), Kubernetes sends SIGKILL and connections drop. The formula: preStop (5s) + drain (5s) + shutdown (25s) = 35s total. Set terminationGracePeriodSeconds to 35s or higher.
Connection timeouts and long-lived connections
[RFC 9110, 2022]Set IdleTimeout on the HTTP server (this example uses 60 seconds). Server.Shutdown waits for active requests but closes idle connections immediately. For WebSocket or gRPC streaming connections, manually track and close hijacked connections:
type ConnTracker struct {
mu sync.Mutex
conns map[net.Conn]struct{}
}
func (t *ConnTracker) Add(conn net.Conn) {
t.mu.Lock()
t.conns[conn] = struct{}{}
t.mu.Unlock()
}
func (t *ConnTracker) CloseAll() {
t.mu.Lock()
defer t.mu.Unlock()
for conn := range t.conns {
conn.Close()
}
}Call tracker.CloseAll() after srv.Shutdown() returns.
Production checklist
- Signal handling: Use
signal.NotifyContextto catch SIGTERM/SIGINT with context cancellation - Server launch: Start HTTP server in background goroutine; check
errors.Is(err, http.ErrServerClosed)to distinguish normal shutdown from real errors - Health probes: Separate
/healthz(liveness, always 200) and/readyz(readiness, 503 when draining) - Readiness toggle: Call
health.SetNotReady()beforeServer.Shutdownto remove the pod from the load balancer - LB drain window: Sleep 5 seconds between
SetNotReady()andShutdown()for endpoint propagation - Shutdown timeout: Use
context.WithTimeoutforServer.Shutdownwith 25-30 second deadline - Idle connections: Set
IdleTimeouton http.Server (60 seconds typical) to prevent connection lingering - Background workers: Track long-lived goroutines with
sync.WaitGroup; wait after HTTP shutdown completes - Worker context: Check
ctx.Done()in worker loops (queue consumers, cron, cache warmers) and exit gracefully - Kubernetes grace period: Set
terminationGracePeriodSeconds>= (preStop + in-app drain + shutdown) — typically 35-40 seconds - PreStop hook: Add
preStop: sleep 5in pod lifecycle to give control plane time to propagate pod removal - Load testing: Verify zero 5xx errors during rolling updates with sustained traffic (e.g., vegeta or k6)
Kubernetes manifest that survives a rolling deploy
The pattern below is what we run on services with sustained 1k+ RPS — preStop hook absorbs the endpoint-propagation race, terminationGracePeriodSeconds gives the in-app drain a real budget, and the readiness/liveness split prevents the kubelet from killing a draining pod prematurely:
# deployment.yaml — production drain settings
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
terminationGracePeriodSeconds: 40 # preStop(5) + drain(5) + Server.Shutdown(25) + 5s headroom
containers:
- name: api
# When this returns failure, the pod is removed from Service endpoints.
# Keep checking the dependency tree; the pod IS still serving in-flight
# requests, just not accepting new ones via the LB.
readinessProbe:
httpGet: { path: /readyz, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 2
# Liveness only checks the process is alive — never the dependencies.
# Otherwise a flaky downstream restarts your healthy pod and amplifies
# the outage instead of containing it.
livenessProbe:
httpGet: { path: /healthz, port: 8080 }
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 6
lifecycle:
preStop:
exec:
# Sleep > kube-proxy iptables propagation lag (typically 1-3s).
# Critical: this gives the LB time to remove the pod from the
# rotation BEFORE the app starts refusing new connections.
command: ["/bin/sh", "-c", "sleep 5"]Pair the manifest with a vegeta load test that proves zero 5xx during a rolling deploy — paste this into your CI matrix so a regression in shutdown handling shows up as a build failure, not a 3am page:
# Run a constant-rate load against the service for 60s, kicking off a
# kubectl rollout halfway through. Any non-2xx during the rollout is a bug.
echo "GET https://api.example.com/healthz" | \
vegeta attack -rate=200 -duration=60s -timeout=2s | \
tee /tmp/results.bin > /dev/null &
VEGETA_PID=$!
sleep 25
kubectl rollout restart deployment/api
wait "$VEGETA_PID"
vegeta report -type=text < /tmp/results.bin
# Acceptable: success rate >= 99.99%, P99 < 250ms, zero 5xx.The two settings teams forget that cause "I see 502s during deploy" tickets:
# Set on the LB / Ingress so the GOAWAY frame from a draining pod is honoured.
# AWS ALB defaults are usually fine; nginx ingress needs the proxy_next_upstream
# block below or it'll return the pod's 502 to the client instead of retrying.
nginx.ingress.kubernetes.io/proxy-next-upstream: "error timeout http_502 http_503"
nginx.ingress.kubernetes.io/proxy-next-upstream-tries: "3"// And on the Go side — register the SIGTERM handler BEFORE the listener
// starts. A common mistake is registering it after, leaving a millisecond
// window where SIGTERM kills the process during startup.
import (
"context"
"errors"
"log"
"net/http"
"os"
"os/signal"
"syscall"
"time"
)
func main() {
sigCh := make(chan os.Signal, 1)
signal.Notify(sigCh, syscall.SIGTERM, syscall.SIGINT)
srv := &http.Server{Addr: ":8080", Handler: router}
go func() {
if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Fatal(err)
}
}()
<-sigCh
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
_ = srv.Shutdown(ctx)
}Draining goroutines without leaking work
The trickiest shutdowns are not the HTTP layer — they are the long-tail goroutines that an HTTP server never tracks. Cache refreshers, log flushers, prefetchers, anomaly scanners, and queue consumers all live outside the request lifecycle, and they are the most common cause of post-deploy data loss. The pattern that actually works in production is one cancellation context fanned out to every worker, paired with a WaitGroup that the shutdown path blocks on with a hard ceiling. Without the ceiling, a single hung worker pins the pod until SIGKILL fires, so any guarantee you thought you had about clean exit silently downgrades to a forced kill.
type Drainer struct {
wg sync.WaitGroup
cancel context.CancelFunc
ctx context.Context
}
func NewDrainer(parent context.Context) *Drainer {
ctx, cancel := context.WithCancel(parent)
return &Drainer{ctx: ctx, cancel: cancel}
}
func (d *Drainer) Go(name string, fn func(context.Context) error) {
d.wg.Add(1)
go func() {
defer d.wg.Done()
if err := fn(d.ctx); err != nil && !errors.Is(err, context.Canceled) {
slog.Error("worker exited", "name", name, "err", err)
}
}()
}
func (d *Drainer) Shutdown(timeout time.Duration) error {
d.cancel()
done := make(chan struct{})
go func() { d.wg.Wait(); close(done) }()
select {
case <-done:
return nil
case <-time.After(timeout):
return fmt.Errorf("drain exceeded %s; forced exit", timeout)
}
}The Drainer is the contract: every background goroutine receives the same parent context, every worker exits when that context cancels, and the deadline ensures one buggy task cannot indefinitely block the deploy.
Background jobs that must survive SIGTERM
Some workloads do not want a hard cap. A long-running batch reconciliation, a checkpointed migration, or a half-written upload to object storage cannot be safely interrupted — but the HTTP layer still needs to drain in seconds. The pattern is to split the lifecycle: the HTTP server obeys the standard 25-second budget, while protected jobs persist their checkpoint, then either complete inline or hand off via a durable queue so a freshly scheduled pod can resume. Mark these jobs explicitly so reviewers see why they bypass the normal drain budget.
type CheckpointedJob struct {
store Store
jobID string
finished atomic.Bool
}
func (j *CheckpointedJob) Run(ctx context.Context) error {
for {
select {
case <-ctx.Done():
return j.store.SaveCheckpoint(j.jobID, j.snapshot())
default:
}
batch, done, err := j.processNext(ctx)
if err != nil {
return err
}
if done {
j.finished.Store(true)
return j.store.MarkComplete(j.jobID)
}
if err := j.store.SaveCheckpoint(j.jobID, batch); err != nil {
return err
}
}
}The discipline is to checkpoint before every external side effect, not at the end. A pod can disappear mid-iteration; the next pod must read the last persisted checkpoint and resume without re-emitting work.
gRPC GracefulStop versus HTTP Shutdown
gRPC services need a parallel sequence with one important difference: grpc.Server.GracefulStop blocks until all RPCs (including streaming ones) finish. There is no built-in deadline, so wrap it in a timeout select and fall back to Stop if streams refuse to drain. The shape mirrors HTTP shutdown but the failure mode is different — a hung bidirectional stream will pin GracefulStop forever unless you escalate.
func shutdownGRPC(ctx context.Context, srv *grpc.Server) error {
done := make(chan struct{})
go func() { srv.GracefulStop(); close(done) }()
select {
case <-done:
return nil
case <-ctx.Done():
srv.Stop() // hard cancel: aborts streams, releases the listener
return ctx.Err()
}
}When a service exposes both HTTP and gRPC on separate ports, run the readiness flip first, then trigger both shutdowns concurrently with a shared deadline. Aggregate the errors rather than short-circuiting — if HTTP drains cleanly but gRPC times out, you still want the stack trace from the streaming handler that refused to release.
Frequently Asked Questions
How does Go's http.Server.Shutdown work?
Shutdown stops the server from accepting new connections, waits for all in-flight requests to complete (up to the context deadline), then returns. It does not interrupt active requests — it lets them finish gracefully.
Why do I get 502 errors during Kubernetes rolling updates?
Kubernetes removes the pod from the Service endpoints asynchronously after sending SIGTERM. Requests can arrive after SIGTERM but before the load balancer updates. Fix this by marking the health endpoint unhealthy and adding a short delay before calling Shutdown.
What signal does Kubernetes send to stop a pod?
Kubernetes sends SIGTERM first and waits for terminationGracePeriodSeconds (default 30s). If the process is still running after that period, it sends SIGKILL which cannot be caught or handled.
How do you handle background goroutines during Go server shutdown?
Use sync.WaitGroup to track active background goroutines (queue consumers, cron jobs). On SIGTERM, cancel their context, then call wg.Wait() after srv.Shutdown() to ensure all background work completes before the process exits.
Keep Reading
- Go Context in Depth: Cancellation, Timeouts, and Debugging in Production — The context propagation model that powers graceful shutdown, from root cancellation to goroutine cleanup
- Production-Grade Go API Design — Health probes, middleware chains, and structured error handling for the HTTP layer that sits above your shutdown logic
- Kubernetes Networking Deep Dive — Understanding endpoint propagation, service discovery, and the networking layer that enforces graceful drain
Engineering Team
A multidisciplinary team of backend engineers, architects, and DevOps practitioners shipping deep dives into distributed systems and production infrastructure.
Read Next
Go Dynamic JSON: Parsing Unknown Schemas in Production
Handle unpredictable JSON in Go: map[string]any, json.RawMessage, type switches, and defensive patterns for shifting schemas.
Idempotency Patterns: Building Retry-Safe Distributed Systems
Why exactly-once is a myth, and how idempotency keys, database constraints, and the outbox pattern make retries safe in Go and Java.
Go Error Handling: errors.Is, errors.As, Wrapping, and Custom Types
Go error handling: sentinel errors, wrapping, errors.Is/As, custom types, and production patterns that prevent silent failures.