#go #api-design #clean-architecture #middleware #error-handling #production

Production-Grade Go API Design: Clean Architecture, Custom Errors, and Middleware That Actually Works

Feb 8, 2026

14 min read

Production-Grade Go API Design: Clean Architecture, Custom Errors, and Middleware That Actually Works

Key Takeaways

→An un-paginated handler that eagerly joins multiple tables and serialises tens of thousands of rows per request will pin CPU within minutes when a frontend client toggles a query param — pagination, projection, and bounded response size are non-negotiable for production endpoints
→Handler-service-repository layers with dependency inversion enable testing without infrastructure — mock the service in handler tests, mock the repo in service tests, test only database behavior separately
→AppError with code, message, detail, cause, and fields lets on-call know exactly what broke without leaking sensitive data — log the detail internally, return only the safe message to clients
→/healthz (liveness: is the process running) and /readyz (readiness: is it safe to route traffic) enable Kubernetes to manage pod restarts and rolling deployments without dropping requests

One query param — ?include_details=true — pegs the CPU within minutes. A frontend release toggles it on a new dashboard widget. The handler, never paginated or optimised for deep relationship loading because "we never need that much data at once," eagerly joins multiple tables and serialises tens of thousands of rows per request. P99 latency climbs into the seconds. We've debugged this pattern on multiple Go services.

In our experience, most Go API tutorials stop when the server responds to a request^{[Go Language Specification]} — that's maybe 20% of production work. The remaining 80% is structured error handling so on-call knows what broke, middleware chains that don't collapse under load, health probes Kubernetes trusts, and layered architecture that lets you change things without touching everything.

The Short Version

Organize code into handler, service, and repository layers inside internal/. Define custom error types with HTTP^{[RFC 9110, 2022]} status codes and user-facing messages. Stack middleware in order: RequestID → Logging → Recovery → RateLimiter → Auth. Implement /healthz and /readyz probes and graceful shutdown to handle production gracefully.

Handler, service, repository layers enforce dependency boundaries — no circular imports or reaching up the stack
Custom AppError with code, message, internal detail, and cause enables logging and consistent JSON responses
Middleware order matters — each layer depends on ones above; recovery catches panics before Auth runs expensive checks
Health probes and graceful shutdown handle Kubernetes lifecycle and prevent request loss during rolling deployments

graph LR
    Req[HTTP request] --> M1[RequestID]
    M1 --> M2[Logging]
    M2 --> M3[Recovery<br/>panic catcher]
    M3 --> M4[RateLimiter]
    M4 --> M5[Auth]
    M5 --> H[Handler:<br/>parse + validate]
    H --> S[Service:<br/>business logic]
    S --> R[Repository:<br/>SQL / cache / APIs]
    R -.no callback up the stack.-> S
    S -.no HTTP types.-> H
    style M3 fill:#fee
    style H fill:#eef
    style S fill:#eef
    style R fill:#eef

The diagram is the dependency-flow lesson in one picture: request enters left, traverses the middleware stack in order (Recovery before Auth so panics in expensive auth checks are caught), then drops through Handler → Service → Repository with no upward calls and no leaked HTTP types into the lower layers. Get the order wrong and a panic in middleware skips your error handler; get the layering wrong and your service can't be tested without a real HTTP server.

The Quick Start — API Design Layers

^{[Go context]}

A production Go API needs three layers: HTTP transport, business logic, and data access. Dependencies flow inward only.

Layer	Owns	Touches	Tested with
Handler (`internal/handler/`)	HTTP parsing, validation, status codes, response shape	Service interface only	`httptest.NewRecorder` + mocked service
Service (`internal/service/`)	Business logic, transaction boundaries, domain invariants	Repository interface only	Pure unit tests + mocked repository
Repository (`internal/repository/`)	SQL, cache, external API calls	Database driver, HTTP client, redis client	`testcontainers-go` for real Postgres / Redis
Middleware (`internal/middleware/`)	RequestID, logging, recovery, auth, rate-limit	None — pure http.Handler wrappers	`httptest.NewServer` + integration
Domain (`internal/domain/`)	Shared types, sentinel errors, value objects	Nothing — leaf package	Plain unit tests

order-service/
├── cmd/server/main.go           # Wire and start
├── internal/
│   ├── handler/                 # Parse HTTP, validate, respond
│   ├── service/                 # Business logic — no HTTP, no DB
│   ├── repository/              # Data access — SQL, cache, APIs
│   ├── domain/                  # Shared types and errors
│   └── middleware/              # RequestID, logging, recovery, auth
└── config/

The rule: handlers call services, services call repositories, repositories call databases. A repository never calls a handler. A service never talks directly to HTTP. This inversion prevents circular imports and makes testing trivial — mock the service in handler tests, mock the repo in service tests.

Why structure matters: when the Friday afternoon outage hits and you need to trace a request through three services and a failing database, you need to know exactly which layer is responsible. Handler bugs are HTTP problems. Service bugs are logic problems. Repository bugs are data problems. Without clear boundaries, everything becomes "the API is broken" with no fast path to diagnosis.

Custom Error Types and Error Middleware

^{[Go 1.13 error wrapping]}

Generic errors.New("something failed") starts a long on-call shift. You need errors that carry context for logging, status codes for HTTP, and safe messages for clients.

type ErrorCode string
 
const (
    ErrCodeNotFound       ErrorCode = "NOT_FOUND"
    ErrCodeValidation     ErrorCode = "VALIDATION_ERROR"
    ErrCodeUnauthorized   ErrorCode = "UNAUTHORIZED"
    ErrCodeExternal       ErrorCode = "EXTERNAL_SERVICE_ERROR"
    ErrCodeInternal       ErrorCode = "INTERNAL_ERROR"
    ErrCodeRateLimited    ErrorCode = "RATE_LIMIT_EXCEEDED"
)
 
type AppError struct {
    Code    ErrorCode
    Message string        // Safe for API response
    Detail  string        // For logs only
    Cause   error         // Wrapped error
    Fields  map[string]any
}
 
func (e *AppError) Error() string {
    if e.Cause != nil {
        return fmt.Sprintf("%s: %s: %v", e.Code, e.Message, e.Cause)
    }
    return fmt.Sprintf("%s: %s", e.Code, e.Message)
}
 
// Unwrap exposes Cause so errors.Is / errors.As traverse the wrapped error.
func (e *AppError) Unwrap() error { return e.Cause }
 
func (e *AppError) HTTPStatus() int {
    switch e.Code {
    case ErrCodeNotFound:
        return http.StatusNotFound
    case ErrCodeValidation:
        return http.StatusBadRequest
    case ErrCodeUnauthorized:
        return http.StatusUnauthorized
    case ErrCodeRateLimited:
        return http.StatusTooManyRequests
    case ErrCodeExternal:
        return http.StatusBadGateway
    default:
        return http.StatusInternalServerError
    }
}
 
// Constructors keep call sites lean
func NotFound(resource, id string) *AppError {
    return &AppError{
        Code:    ErrCodeNotFound,
        Message: fmt.Sprintf("%s not found", resource),
        Fields:  map[string]any{"resource": resource, "id": id},
    }
}
 
func ValidationError(field, reason string) *AppError {
    return &AppError{
        Code:    ErrCodeValidation,
        Message: fmt.Sprintf("invalid %s: %s", field, reason),
    }
}

In your service layer, return these errors:

func (s *OrderService) GetOrder(ctx context.Context, id uuid.UUID) (*domain.Order, error) {
    order, err := s.repo.GetByID(ctx, id)
    if err != nil {
        if errors.Is(err, pgx.ErrNoRows) {
            return nil, domain.NotFound("order", id.String())
        }
        return nil, &domain.AppError{
            Code:    domain.ErrCodeInternal,
            Message: "failed to retrieve order",
            Cause:   err,
            Fields:  map[string]any{"order_id": id},
        }
    }
    return order, nil
}

Then in handlers, use a shared error helper that logs internals and returns only the safe message:

type APIResponse struct {
    Data  any       `json:"data,omitempty"`
    Error *APIError `json:"error,omitempty"`
}
 
type APIError struct {
    Code    string `json:"code"`
    Message string `json:"message"`
}
 
func respondError(w http.ResponseWriter, r *http.Request, err error) {
    logger := loggerFromContext(r.Context())
 
    var appErr *domain.AppError
    if errors.As(err, &appErr) {
        // Log internal details with full context
        fields := []any{"code", appErr.Code, "request_id", requestIDFromContext(r.Context())}
        if appErr.Detail != "" {
            fields = append(fields, "detail", appErr.Detail)
        }
        if appErr.Cause != nil {
            fields = append(fields, "cause", appErr.Cause)
        }
        for k, v := range appErr.Fields {
            fields = append(fields, k, v)
        }
 
        if appErr.HTTPStatus() >= 500 {
            logger.Error("request failed", fields...)
        } else {
            logger.Warn("request failed", fields...)
        }
 
        w.Header().Set("Content-Type", "application/json")
        w.WriteHeader(appErr.HTTPStatus())
        json.NewEncoder(w).Encode(APIResponse{
            Error: &APIError{Code: string(appErr.Code), Message: appErr.Message},
        })
        return
    }
 
    // Unknown error type
    logger.Error("unhandled error", "error", err)
    w.Header().Set("Content-Type", "application/json")
    w.WriteHeader(http.StatusInternalServerError)
    json.NewEncoder(w).Encode(APIResponse{
        Error: &APIError{Code: "INTERNAL_ERROR", Message: "an unexpected error occurred"},
    })
}

Never expose internal error details, database table names, or stack traces to clients. Only send the user-facing message. Log the full context internally with request ID for correlation during post-mortems.

Middleware Chain and Dependency Injection

^{[Go net/http]}

Middleware order matters. Each layer depends on ones above. Stack them as:

RequestID — generates and stores X-Request-Id for all downstream logs
Logging — captures method, path, status, duration; requires RequestID
Recovery — catches panics before they crash goroutines
RateLimiter — rejects over-limit requests before expensive work
Auth — validates tokens; runs after RateLimiter to avoid waste

This ordering prevents resource exhaustion: rate limiting blocks bad actors before authentication burns CPU validating tokens, and recovery catches panics before any of them crash goroutines.

Wire in main.go:

func main() {
    db, _ := connectDB(os.Getenv("DATABASE_URL"))
    defer db.Close()
 
    // Bottom-up wiring: database → repo → service → handler
    orderRepo := repository.NewOrderRepository(db)
    orderService := service.NewOrderService(orderRepo)
    orderHandler := handler.NewOrderHandler(orderService)
    healthHandler := handler.NewHealthHandler(db)
 
    router := chi.NewRouter()
    
    // Middleware in order
    router.Use(middleware.RequestID)
    router.Use(middleware.Logging(slog.Default()))
    router.Use(middleware.Recovery(slog.Default()))
    router.Use(middleware.RateLimiter(100)) // per second
 
    // Health endpoints outside auth
    router.Get("/healthz", healthHandler.Liveness)
    router.Get("/readyz", healthHandler.Readiness)
 
    // Protected routes
    router.Route("/v1", func(r chi.Router) {
        r.Use(middleware.Auth(os.Getenv("JWT_SECRET")))
        r.Post("/orders", orderHandler.CreateOrder)
        r.Get("/orders/{id}", orderHandler.GetOrder)
    })
 
    server := &http.Server{
        Addr:         ":8080",
        Handler:      router,
        ReadTimeout:  10 * time.Second,
        WriteTimeout: 30 * time.Second,
    }
 
    // Graceful shutdown on SIGTERM
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
    shutdownDone := make(chan struct{})
    go func() {
        <-sigChan
        ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
        defer cancel()
        server.Shutdown(ctx)
        close(shutdownDone)
    }()
 
    // ListenAndServe returns ErrServerClosed the instant Shutdown is called;
    // block on shutdownDone so in-flight requests finish draining before exit.
    if err := server.ListenAndServe(); !errors.Is(err, http.ErrServerClosed) {
        log.Fatalf("listen: %v", err)
    }
    <-shutdownDone
}

Validation and Health Probes

^{[Kubernetes docs]}

graph LR
    K8s[Kubernetes] -->|"every periodSeconds"| LP["/healthz<br/>liveness probe"]
    K8s -->|"every periodSeconds"| RP["/readyz<br/>readiness probe"]
    LP -->|always returns 200<br/>if process running| Process[Go process running?]
    RP -->|checks dependencies| DB[(DB connection)]
    RP -->|checks dependencies| Cache[(Cache client)]
    RP -->|checks dependencies| Down[Downstream services]
    Process -.->|fail → restart pod| Restart[Pod restart]
    DB -.->|fail → drop from Service endpoints| Drop[No new traffic]
    Cache -.->|fail → drop| Drop
    Down -.->|fail → drop| Drop
    style Process fill:#eef
    style Drop fill:#fee
    style Restart fill:#fee

The diagram shows the liveness vs readiness split: liveness asks "is the process running?" — failing it triggers a restart. Readiness asks "can this pod safely serve traffic right now?" — failing it just drops the pod from Service endpoints without restarting. Most production incidents are someone wiring readiness into a probe that should be liveness, causing endless restart loops when a downstream service blips.

Use go-playground/validator for declarative validation. Struct tags become self-documenting API contracts:

var validate = validator.New(validator.WithRequiredStructEnabled())
 
type CreateOrderRequest struct {
    CustomerID uuid.UUID `json:"customer_id" validate:"required"`
    Items      []Item    `json:"items"       validate:"required,min=1,dive"`
    Currency   string    `json:"currency"    validate:"required,oneof=USD EUR GBP"`
}
 
func decodeAndValidate[T any](r *http.Request) (T, error) {
    var req T
    decoder := json.NewDecoder(r.Body)
    decoder.DisallowUnknownFields() // Catch typos early
 
    if err := decoder.Decode(&req); err != nil {
        return req, domain.ValidationError("body", err.Error())
    }
    if err := validate.Struct(req); err != nil {
        return req, domain.ValidationError("body", err.Error())
    }
    return req, nil
}
 
func (h *OrderHandler) CreateOrder(w http.ResponseWriter, r *http.Request) {
    req, err := decodeAndValidate[CreateOrderRequest](r)
    if err != nil {
        respondError(w, r, err)
        return
    }
    order, err := h.service.CreateOrder(r.Context(), req)
    if err != nil {
        respondError(w, r, err)
        return
    }
    respondOK(w, http.StatusCreated, order)
}

For Kubernetes, expose two health endpoints with different semantics:

func (h *HealthHandler) Liveness(w http.ResponseWriter, r *http.Request) {
    // Returns 200 if the process is alive. No dependency checks.
    // Kubernetes uses this to restart crashed pods.
    w.WriteHeader(http.StatusOK)
}
 
func (h *HealthHandler) Readiness(w http.ResponseWriter, r *http.Request) {
    // Checks dependencies: database, cache, external services.
    // Kubernetes uses this to remove pods from load balancers.
    ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
    defer cancel()
 
    if err := h.db.Ping(ctx); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        return
    }
    w.WriteHeader(http.StatusOK)
}

Liveness probes restart crashed or deadlocked pods; readiness probes control whether a pod receives traffic. During a rolling deployment Kubernetes drops the terminating pod from Service endpoints asynchronously — concurrent with SIGTERM, not guaranteed before it — so the drain window in your shutdown handler is what actually prevents dropped requests.

Production Checklist

Prometheus middleware that doesn't lie about latency

Most teams instrument with time.Since(start) at the wrong layer and end up with histograms that exclude middleware time. The middleware below times the full request — including downstream RateLimiter, Auth, and JSON encoding — and labels by route pattern (not raw URL) to keep cardinality bounded: ^{[Prometheus Best Practices]}

package middleware
 
import (
	"net/http"
	"strconv"
	"time"
 
	"github.com/go-chi/chi/v5"
	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
)
 
var (
	httpDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{
		Name:    "http_request_duration_seconds",
		Help:    "HTTP request duration by method, route pattern, and status.",
		// Buckets tuned for typical p50=10ms .. p99=2s API workloads.
		Buckets: []float64{0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0},
	}, []string{"method", "route", "status"})
 
	httpInFlight = promauto.NewGauge(prometheus.GaugeOpts{
		Name: "http_requests_in_flight",
		Help: "Currently-executing HTTP requests; sustained > pool size means saturation.",
	})
)
 
type statusRecorder struct {
	http.ResponseWriter
	status int
}
 
func (s *statusRecorder) WriteHeader(code int) {
	s.status = code
	s.ResponseWriter.WriteHeader(code)
}
 
func Metrics(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()
		httpInFlight.Inc()
		defer httpInFlight.Dec()
 
		rec := &statusRecorder{ResponseWriter: w, status: http.StatusOK}
		next.ServeHTTP(rec, r)
 
		// chi.RouteContext gives the *pattern* (/users/{id}), never the raw path
		// (/users/123) — keeps the label-set bounded regardless of traffic shape.
		route := chi.RouteContext(r.Context()).RoutePattern()
		if route == "" {
			route = "unmatched"
		}
 
		httpDuration.
			WithLabelValues(r.Method, route, strconv.Itoa(rec.status)).
			Observe(time.Since(start).Seconds())
	})
}

The corresponding circuit breaker for database calls — wrap every *sql.DB (or pgx pool) at the repository layer, not at the handler, so a slow primary doesn't burn the connection pool while you wait on the request budget:

package repository
 
import (
	"context"
	"database/sql"
	"errors"
	"time"
 
	"github.com/sony/gobreaker"
)
 
type CircuitDB struct {
	db *sql.DB
	cb *gobreaker.CircuitBreaker
}
 
func NewCircuitDB(db *sql.DB) *CircuitDB {
	settings := gobreaker.Settings{
		Name:        "primary-db",
		MaxRequests: 1,
		Interval:    60 * time.Second,
		Timeout:     30 * time.Second,
		ReadyToTrip: func(c gobreaker.Counts) bool {
			// Trip when 60% of the last 20+ calls failed.
			return c.Requests >= 20 && float64(c.TotalFailures)/float64(c.Requests) >= 0.6
		},
		IsSuccessful: func(err error) bool {
			// Treat context cancellation as success — the client gave up,
			// not the database. Otherwise a slow client would trip the breaker.
			return err == nil || errors.Is(err, context.Canceled)
		},
	}
	return &CircuitDB{db: db, cb: gobreaker.NewCircuitBreaker(settings)}
}
 
func (c *CircuitDB) QueryRowContext(ctx context.Context, q string, args ...any) (*sql.Row, error) {
	res, err := c.cb.Execute(func() (interface{}, error) {
		row := c.db.QueryRowContext(ctx, q, args...)
		// We can't detect query failure here — Scan returns the error — so the
		// breaker only trips on connection-acquisition failures. That's fine:
		// repeated Scan errors are usually data-shape bugs, not degradation.
		return row, nil
	})
	if err != nil {
		return nil, err
	}
	return res.(*sql.Row), nil
}

The IsSuccessful rule matters: without it, a flood of cancelled contexts (slow clients, noisy load test) trips the breaker on a perfectly healthy database. Cancellations are the client's failure, not the dependency's.

Always-On pprof and runtime/trace

The hardest production incidents are the ones you cannot reproduce locally. A goroutine leak that takes six hours to manifest, a heap that climbs a megabyte per minute under real traffic, a scheduler stall that only shows up when GOMAXPROCS hits the cgroup ceiling — none of these surface in unit tests, and none survive a restart. The only durable answer is to ship every Go binary with profiling endpoints permanently mounted on a private port and to sample runtime/trace on demand. The cost is negligible (an idle pprof endpoint adds nothing measurable), the value when you need it is the difference between a five-minute fix and a war room.

Mount pprof on a separate 127.0.0.1-bound listener so it never appears on the public port, then attach a guarded /debug/trace endpoint that streams an execution trace for a bounded duration. Five seconds is usually enough to capture a stall; thirty seconds is more than enough to characterise a steady-state workload:

package main
 
import (
	"context"
	"errors"
	"net/http"
	_ "net/http/pprof" // mounts handlers on http.DefaultServeMux
	"runtime/trace"
	"strconv"
	"time"
)
 
func startDebugServer(token string) *http.Server {
	mux := http.DefaultServeMux
 
	// /debug/trace?seconds=5 — bounded execution trace download.
	mux.HandleFunc("/debug/trace", func(w http.ResponseWriter, r *http.Request) {
		if r.Header.Get("X-Debug-Token") != token {
			http.Error(w, "forbidden", http.StatusForbidden)
			return
		}
		secs, _ := strconv.Atoi(r.URL.Query().Get("seconds"))
		if secs <= 0 || secs > 60 {
			secs = 5
		}
		w.Header().Set("Content-Type", "application/octet-stream")
		w.Header().Set("Content-Disposition", `attachment; filename="trace.out"`)
		if err := trace.Start(w); err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}
		defer trace.Stop()
		select {
		case <-time.After(time.Duration(secs) * time.Second):
		case <-r.Context().Done():
		}
	})
 
	srv := &http.Server{Addr: "127.0.0.1:6060", Handler: mux}
	go func() {
		if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
			panic(err)
		}
	}()
	return srv
}
 
func main() {
	dbg := startDebugServer("rotate-me-via-secret-manager")
	defer dbg.Shutdown(context.Background())
	// ... rest of main
}

The 127.0.0.1 bind plus the token header are non-negotiable: net/http/pprof registers handlers like /debug/pprof/cmdline that leak the full process argv, which often contains connection strings on platforms that pass them as flags. Anyone with kubectl access port-forwards the debug port; nobody from the internet ever should.

A Structured Logging Contract: request_id Everywhere

A logging stack is only useful if every line emitted during a single request can be joined back together. That requires three discipline points enforced by code, not by convention. First, every request gets exactly one request_id — generated at the edge if the client did not supply one, propagated downstream in headers and contexts. Second, that request_id lands on every log line emitted while handling the request, which means no goroutine spawned from a handler may use the package-level slog.Default() without first deriving a child logger from the request context. Third, the field name is fixed across services so log queries do not have to special-case dialects.

package logging
 
import (
	"context"
	"log/slog"
	"net/http"
 
	"github.com/google/uuid"
)
 
type ctxKey struct{}
 
var loggerKey = ctxKey{}
 
// FromContext returns a logger that always carries request_id, falling back
// to the package default if the request did not pass through Inject.
func FromContext(ctx context.Context) *slog.Logger {
	if l, ok := ctx.Value(loggerKey).(*slog.Logger); ok {
		return l
	}
	return slog.Default()
}
 
// Inject is the only middleware allowed to populate request_id. It coexists
// with chi/middleware.RequestID by reading the canonical X-Request-Id header.
func Inject(base *slog.Logger) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			reqID := r.Header.Get("X-Request-Id")
			if reqID == "" {
				reqID = uuid.NewString()
				r.Header.Set("X-Request-Id", reqID)
			}
			w.Header().Set("X-Request-Id", reqID)
 
			child := base.With(
				slog.String("request_id", reqID),
				slog.String("method", r.Method),
				slog.String("route", r.URL.Path),
			)
			ctx := context.WithValue(r.Context(), loggerKey, child)
			next.ServeHTTP(w, r.WithContext(ctx))
		})
	}
}

Two failure modes to guard against. A goroutine launched with go fireAndForget(req) inherits no context, so it logs without request_id and the trail dies — fix it by passing a deliberately-derived context.Background() plus an explicit logger argument. And a downstream HTTP client must inject the same X-Request-Id header into outbound calls, otherwise the next service generates a fresh ID and the request graph fragments at every hop.

Why chi Middleware Order Is Not Negotiable

The middleware list earlier in this article — RequestID, Logging, Recovery, RateLimiter, Auth — is not aesthetic. Each layer depends on invariants established by the ones above it, and the wrong order silently breaks observability or opens denial-of-service vectors. RequestID runs first because every later log line needs it; if Logging ran first, panics from upstream layers would lack a correlation ID and become unsearchable. Recovery sits above RateLimiter and Auth, not below, because a panic inside the rate limiter — for example, a nil Redis client during deploy — would otherwise crash the entire server process, not just the request. RateLimiter precedes Auth because token validation is expensive (signature verification, JWKS lookup, sometimes a database call); rejecting a credential-stuffing attacker before that work begins is the difference between a brownout and a healthy service. Auth runs last among the cross-cutting layers so that authenticated routes can read identity from context, but liveness and readiness probes are mounted outside the auth subtree so Kubernetes does not need credentials to keep your pods alive.

The only exception worth memorising: tracing middleware (OpenTelemetry, Honeycomb's Beeline) must wrap RequestID, not the other way around, because trace spans are the parent record into which the request_id becomes an attribute. Reverse that and traces orphan themselves at the gateway. Audit your router.Use block in code review; reorderings are the kind of one-line change reviewers miss.

Frequently Asked Questions

How should you structure a production Go API?

Use handler, service, and repository layers in internal/. Handlers parse requests and respond. Services hold business logic. Repositories handle data access. Dependencies flow inward — a repository never calls a handler.

How do you implement structured error handling?

Define AppError with code, user-facing message, internal detail, cause error, and structured fields. Map codes to HTTP status. Build error middleware that logs full details and returns only the safe message to clients.

What health endpoints should you expose?

Expose /healthz (liveness probe — returns 200 if running) and /readyz (readiness probe — checks database and cache). Kubernetes uses these to manage pod restarts and traffic routing.

How do you implement graceful shutdown?

Call server.Shutdown(ctx) on SIGTERM. This stops accepting new connections and waits for in-flight requests to complete within a timeout, preventing dropped requests during pod termination.

Keep Reading

Go error handling patterns — deeper dive into custom error types and error wrapping
Kubernetes Networking Deep Dive — service routing, ingress, and the network plane behind your Go API
Go Graceful Shutdown in Production — clean lifecycle and request draining for layered Go services

Was this article helpful?

Your feedback directly shapes our editorial depth and technical accuracy.

BackendBytes Engineering Team

Engineering Team

A multidisciplinary team of backend engineers, architects, and DevOps practitioners shipping deep dives into distributed systems and production infrastructure.