#go #viper #configuration #yaml #docker #kubernetes

Go Configuration Mastery: Production Patterns with Viper

Q: What is Viper's configuration precedence order in Go?

Viper resolves values in this priority order (highest to lowest): explicit Set calls, CLI flags, environment variables, config files, key/value stores, and default values. Higher-priority sources silently override lower ones.

Q: How do you hot-reload configuration in Go with Viper?

Call viper.WatchConfig() with fsnotify to watch the config file for changes. Use viper.OnConfigChange() to register a callback that re-validates and atomically swaps the config struct when the file is modified.

Q: Should I use Viper or koanf for Go configuration?

Use Viper for existing codebases or multi-source hierarchical config (files, env, flags, remote stores). Use koanf for new projects that want a modern API with a smaller dependency tree and similar capabilities.

Q: How do you validate configuration in Go with Viper?

Unmarshal Viper values into a typed struct with validation tags, then run a validator (like go-playground/validator) at startup. This catches missing required fields, invalid values, and type mismatches before the service starts serving traffic.

BackendBytes Engineering Team

Feb 1, 2026

17 min read

Go Configuration Mastery: Production Patterns with Viper

You edit the ConfigMap mid-sale to kill a heavy ML model. Nothing happens. The service uses Viper, but nobody called viper.WatchConfig(). By the time the deployment rolls 200 pods, the window has closed and the rollback costs more than the original problem. We've debugged variants of this on multiple Go services.

Configuration drift costs. A missed WatchConfig() call. An env var silently overriding a database host. A rotated secret that the service still doesn't know about. Viper solves these problems — but only if you set it up correctly.

This guide covers the production pattern: typed config structs with validation, the Viper precedence hierarchy, Kubernetes integration, live reload for feature flags, and zero-restart secrets rotation.

TL;DR

Use a typed config struct with go-playground/validator validation, set explicit precedence via SetEnvPrefix and SetEnvKeyReplacer, watch ConfigMaps for changes with WatchConfig + OnConfigChange, and validate new config before swapping.

Unmarshal config into a struct; never scatter viper.GetString() calls
Test config loading with table-driven tests covering env var overrides and invalid values
Rotate secrets by watching mounted files, not environment variables

graph TD
    Defaults[Defaults<br/>SetDefault] -->|lowest priority| Resolve{Viper.Get}
    KV[K/V store<br/>etcd / Consul] -->|↑| Resolve
    File[Config file<br/>config.yaml] -->|↑| Resolve
    Env[Env vars<br/>SetEnvPrefix] -->|↑| Resolve
    Flag[CLI flags<br/>BindPFlag] -->|↑| Resolve
    Set[Explicit Set] -->|highest| Resolve
    Resolve --> Cfg[Typed Config struct<br/>+ validator tags]
    Cfg -->|fail closed| Boot[Service starts]
    Watch[fsnotify on file] -.->|change| OnChange[OnConfigChange]
    OnChange -.->|re-unmarshal + revalidate| Cfg
    style Set fill:#fee
    style Cfg fill:#efe
    style Boot fill:#efe

The diagram is the precedence ladder + the hot-reload loop. The "fail closed" arrow is the discipline: if validation fails after a reload, keep the old config rather than swap in something invalid. That's the bug most teams miss. ^{[Viper docs]}

When to Use Viper

^{[Viper docs]}

Viper is right for services with hierarchical, multi-source config (files, env vars, flags, remote stores). Use it when:

Config spans files, environment variables, and flags
You need a typed struct with nested fields
Hot reload matters (feature flags, timeouts)
You deploy to Kubernetes with ConfigMaps and Secrets

Pick the right tool by counting the sources you actually need:

Tool	Sources	Hot reload	Best for
`os.Getenv`	env vars only	No	Under 10 flat values; quick scripts; no struct
`envconfig`	env vars + struct tags	No	Single-source services with typed config
`koanf`	files / env / KV / flags	Yes (per-source)	New projects wanting a modern modular API
Viper	files / env / flags / KV stores	Yes via `WatchConfig`	Multi-source; hot-reload; large existing Go ecosystems
`spf13/cobra` + `viper`	+ CLI subcommand binding	Yes	CLI tools that also need structured config files

Use os.Getenv for < 10 flat config values. Use koanf for new projects wanting a modern API. For existing Viper codebases, stay: v1 is stable and widely deployed.

Precedence Hierarchy

Viper resolves in this order (highest to lowest): explicit Set() calls, CLI flags, environment variables, config files, key/value stores, defaults^{[Viper docs]}.

graph TD
    Code[viper.Get database.host] --> P1{Was Set explicitly?}
    P1 -->|Yes| Win1[Return explicit value]
    P1 -->|No| P2{Bound to flag<br/>and flag passed?}
    P2 -->|Yes| Win2[Return flag value]
    P2 -->|No| P3{Env var set?<br/>APP_DATABASE_HOST}
    P3 -->|Yes| Win3[Return env value]
    P3 -->|No| P4{Config file<br/>has key?}
    P4 -->|Yes| Win4[Return file value]
    P4 -->|No| P5{Remote KV store?<br/>etcd, consul}
    P5 -->|Yes| Win5[Return remote value]
    P5 -->|No| P6{SetDefault called?}
    P6 -->|Yes| Win6[Return default]
    P6 -->|No| Zero[Zero value]
    style Win3 fill:#dfd
    style Win4 fill:#dfd
    style Win6 fill:#ffd
    style Zero fill:#fdd

The diagram is the Kubernetes deploy story: image-baked YAML provides the safe defaults, ConfigMaps mount as env vars to override per environment, and Secrets mount as env vars to inject credentials at the highest precedence. You never rebuild just to change non-secret config.

Key gotcha: Viper lowercases all keys internally. APP_DATABASE_HOST and app_database_host collide. Use SetEnvPrefix("APP") and SetEnvKeyReplacer(strings.NewReplacer(".", "_")) consistently to avoid silent overrides.

Typed Config Struct Pattern

Never scatter viper.GetString("database.host") throughout your codebase. Unmarshal into a typed struct at startup:

package config
 
import (
    "fmt"
    "log/slog"
    "strings"
    "time"
 
    "github.com/go-playground/validator/v10"
    "github.com/spf13/viper"
)
 
type Config struct {
    Service    ServiceConfig    `mapstructure:"service"`
    Database   DatabaseConfig   `mapstructure:"database"`
    Features   FeatureConfig    `mapstructure:"features"`
    Monitoring MonitoringConfig `mapstructure:"monitoring"`
}
 
type ServiceConfig struct {
    Name    string        `mapstructure:"name"    validate:"required"`
    Version string        `mapstructure:"version" validate:"required,semver"`
    Port    int           `mapstructure:"port"    validate:"required,gte=1024,lte=65535"`
    Timeout time.Duration `mapstructure:"timeout" validate:"required,min=1s,max=60s"`
}
 
type DatabaseConfig struct {
    Host     string `mapstructure:"host"     validate:"required,hostname|ip"`
    Port     int    `mapstructure:"port"     validate:"required,gte=1,lte=65535"`
    User     string `mapstructure:"user"     validate:"required"`
    Password string `mapstructure:"password" validate:"required,min=8"`
    Name     string `mapstructure:"name"     validate:"required"`
    SSLMode  string `mapstructure:"ssl_mode" validate:"oneof=disable require verify-ca verify-full"`
    Pool     struct {
        MaxConnections int           `mapstructure:"max_connections" validate:"gte=1,lte=500"`
        IdleTimeout    time.Duration `mapstructure:"idle_timeout"`
    } `mapstructure:"pool"`
}
 
type FeatureConfig struct {
    NewRecommendationEngine bool          `mapstructure:"new_recommendation_engine"`
    ABTestingEnabled        bool          `mapstructure:"ab_testing_enabled"`
    CacheTTL                time.Duration `mapstructure:"cache_ttl"`
}
 
type MonitoringConfig struct {
    MetricsEnabled      bool          `mapstructure:"metrics_enabled"`
    TracingSampleRate   float64       `mapstructure:"tracing_sample_rate" validate:"gte=0,lte=1"`
    HealthCheckInterval time.Duration `mapstructure:"health_check_interval"`
}

func Load(configPath string) (*Config, error) {
    v := viper.New()
    setDefaults(v)
 
    v.SetConfigName("config")
    v.SetConfigType("yaml")
    v.AddConfigPath(configPath)
    v.AddConfigPath(".")
 
    v.AutomaticEnv()
    v.SetEnvPrefix("APP")
    v.SetEnvKeyReplacer(strings.NewReplacer(".", "_"))
 
    // AutomaticEnv only overrides keys Viper already knows from a default, the
    // config file, or a flag. Keys that arrive *only* via env (DB credentials,
    // name) are invisible to Unmarshal unless bound explicitly — otherwise they
    // silently stay empty. See spf13/viper#761.
    for _, key := range []string{"database.user", "database.password", "database.name"} {
        _ = v.BindEnv(key)
    }
 
    if err := v.ReadInConfig(); err != nil {
        if _, ok := err.(viper.ConfigFileNotFoundError); !ok {
            return nil, fmt.Errorf("error reading config file: %w", err)
        }
        // Config file is optional — environment variables and defaults are sufficient
    }
 
    var cfg Config
    if err := v.Unmarshal(&cfg); err != nil {
        return nil, fmt.Errorf("unable to decode config: %w", err)
    }
 
    if err := validate(cfg); err != nil {
        return nil, fmt.Errorf("config validation failed: %w", err)
    }
 
    return &cfg, nil
}
 
func setDefaults(v *viper.Viper) {
    v.SetDefault("service.port", 8080)
    v.SetDefault("service.timeout", "30s")
    v.SetDefault("database.port", 5432)
    v.SetDefault("database.ssl_mode", "require")
    v.SetDefault("database.pool.max_connections", 25)
    v.SetDefault("database.pool.idle_timeout", "10m")
    v.SetDefault("features.cache_ttl", "5m")
    v.SetDefault("monitoring.metrics_enabled", true)
    v.SetDefault("monitoring.tracing_sample_rate", 0.1)
    v.SetDefault("monitoring.health_check_interval", "30s")
}

Validate Early, Fail Fast

Use go-playground/validator to validate the struct after Unmarshal. Run field-level validation (required, gte, oneof tags) plus cross-field rules (e.g., if A/B testing enabled, recommendation engine must be enabled). Validation failure blocks startup — far better than discovering invalid config under load.

func validate(cfg Config) error {
    v := validator.New()
    if err := v.Struct(cfg); err != nil {
        return err
    }
 
    // Cross-field: if A/B testing enabled, recommendation engine must be too
    if cfg.Features.ABTestingEnabled && !cfg.Features.NewRecommendationEngine {
        return fmt.Errorf("ab_testing requires new_recommendation_engine")
    }
 
    return nil
}

Log Resolved Config (Redacted)

Log what config the service actually loaded at startup so operators can verify it during incidents. Critical rule: redact all secrets.

func LogConfig(cfg *Config) {
    redacted := *cfg
    redacted.Database.Password = "***REDACTED***"
    slog.Info("config loaded",
        "service", redacted.Service.Name,
        "port", redacted.Service.Port,
        "db.host", redacted.Database.Host,
        "db.ssl_mode", redacted.Database.SSLMode,
        "source", viper.ConfigFileUsed(),
    )
}

For many secrets, automate redaction with struct tags (log:"redact") and reflection — never rely on manual care.

Config Files and Kubernetes Integration

^{[Kubernetes docs]}

Store YAML defaults in the image; override with environment variables from Kubernetes ConfigMaps (non-secrets) and Secrets:

# config.yaml — baked into image
service:
  name: "user-service"
  port: 8080
database:
  host: "localhost"
  port: 5432
  ssl_mode: "require"
  # user, password: never in YAML — from APP_DATABASE_USER env var
features:
  new_recommendation_engine: false

In Kubernetes, ConfigMaps override file defaults; Secrets override both:

env:
  # From ConfigMap
  - name: APP_DATABASE_HOST
    valueFrom:
      configMapKeyRef:
        name: database-config
        key: host
  # From Secret
  - name: APP_DATABASE_PASSWORD
    valueFrom:
      secretKeyRef:
        name: db-secret
        key: password

Never put passwords, API keys, or TLS certs in YAML files.

Live Reload for Feature Flags

^{[Viper docs]}

Watch config files for changes with WatchConfig() + OnConfigChange(). Validate new config before swapping:

type Manager struct {
    mu     sync.RWMutex
    config Config
}
 
v.WatchConfig()
v.OnConfigChange(func(e fsnotify.Event) {
    var newCfg Config
    v.Unmarshal(&newCfg)
 
    if err := validate(newCfg); err != nil {
        slog.Error("reload validation failed — keeping previous", "error", err)
        return
    }
 
    m.mu.Lock()
    m.config = newCfg
    m.mu.Unlock()
    slog.Info("config reloaded")
})

For read-heavy services, use sync/atomic.Value instead of sync.RWMutex — lock-free reads via hardware CAS. Valid only for feature flags, sample rates, timeouts. Never live-reload database credentials or TLS certs — use file-based secrets rotation (below) instead.

Secrets Rotation Without Restart

Kubernetes Secrets as environment variables don't update after the pod starts. Mount secrets as files instead, watch them with fsnotify^{[Kubernetes docs]}, and reconnect when rotation occurs:

// PoolHolder publishes the live pool via an atomic pointer. Callers must read
// the current pool with holder.Get() for every query so a rotation is picked up
// on the next call — never cache the *pgxpool.Pool.
type PoolHolder struct {
    ptr atomic.Pointer[pgxpool.Pool]
}
 
func (h *PoolHolder) Get() *pgxpool.Pool { return h.ptr.Load() }
 
func Watch(ctx context.Context, secretPath string, holder *PoolHolder) {
    watcher, err := fsnotify.NewWatcher()
    if err != nil {
        slog.Error("secret watcher init failed", "error", err)
        return
    }
    defer watcher.Close()
    if err := watcher.Add(secretPath); err != nil {
        slog.Error("secret watch add failed", "path", secretPath, "error", err)
        return
    }
 
    for {
        select {
        case event := <-watcher.Events:
            if event.Op&fsnotify.Write != 0 {
                if err := rotate(ctx, secretPath, holder); err != nil {
                    slog.Error("rotate failed", "error", err)
                }
            }
        case <-ctx.Done():
            return
        }
    }
}
 
func rotate(ctx context.Context, secretPath string, holder *PoolHolder) error {
    creds, err := loadCredentials(secretPath)
    if err != nil {
        return fmt.Errorf("load credentials: %w", err)
    }
    newPool, err := pgxpool.New(ctx, creds.ConnString())
    if err != nil {
        return fmt.Errorf("open new pool: %w", err)
    }
    if err := newPool.Ping(ctx); err != nil {
        newPool.Close() // don't leak the pool we're about to discard
        return fmt.Errorf("verify new pool: %w", err)
    }
 
    // Publish the new pool atomically, then drain and close the old one in the
    // background so in-flight queries on it finish first.
    old := holder.ptr.Swap(newPool)
    if old != nil {
        go func() {
            time.Sleep(30 * time.Second)
            old.Close()
        }()
    }
    return nil
}

Mount the secret as a file volume:

volumes:
  - name: db-secret
    secret:
      secretName: database-secret
containers:
  - volumeMounts:
      - name: db-secret
        mountPath: /run/secrets/db

Testing Config Loading

Use table-driven tests to catch invalid configs before production:

tests := []struct {
    name    string
    yaml    string
    envVars map[string]string
    wantErr string
}{
    {
        name: "valid config",
        yaml: validYAML,
        envVars: map[string]string{"APP_DATABASE_PASSWORD": "secure123"},
    },
    {
        name: "missing password",
        yaml: validYAML,
        wantErr: "Password is required",
    },
}
 
for _, tt := range tests {
    t.Run(tt.name, func(t *testing.T) {
        dir := t.TempDir()
        os.WriteFile(filepath.Join(dir, "config.yaml"), []byte(tt.yaml), 0644)
        for k, v := range tt.envVars {
            t.Setenv(k, v)
        }
        cfg, err := config.Load(dir)
        if tt.wantErr != "" && !strings.Contains(err.Error(), tt.wantErr) {
            t.Fatalf("expected error %q, got %v", tt.wantErr, err)
        }
    })
}

Use t.Setenv (auto-restore) + t.TempDir for isolation. Run tests with -race to catch concurrent access bugs.

Common Pitfalls

Silent key collisions: Viper lowercases all keys. APP_API_KEY and app_api_key collide. Use SetEnvPrefix("APP") consistently.
Boolean ambiguity: Env vars are strings. Viper treats "true", "1", "t" as true, but empty string "" silently becomes false. Validate with go-playground/validator.
Init-function antipattern: Never load config in init() — makes testing impossible. Pass config as a dependency.
Secret leakage: If validation fails on a struct with DB credentials, error messages may dump them. Redact connection strings before logging.
Scattered GetString calls: Global state smell. Always unmarshal to a struct and pass it around.
Debugging precedence: Use viper.Debug() to print resolution order — invaluable when env vars silently override files.

Production Checklist

Config unmarshalled into typed struct — no scattered viper.GetString calls
Required fields + cross-field rules validated with go-playground/validator
Secrets in environment variables or file mounts — never in YAML
Startup log prints resolved config with secrets redacted
SetEnvPrefix("APP") + SetEnvKeyReplacer(strings.NewReplacer(".", "_")) set
Live reload validates config before swapping; invalid reloads log error and keep previous
File-based secrets with fsnotify watcher + pool-swap rotation (not env vars)
Table-driven tests: missing fields, out-of-range, env var overrides, cross-field rules
viper.Debug() available for troubleshooting precedence in staging

Secrets Management Beyond Environment Variables

Environment variables are the floor, not the ceiling. They leak through /proc/[pid]/environ, they show up in crash dumps, they require a pod restart to rotate, and they get baked into stack traces. For anything beyond a side project, route credentials through a secrets backend — HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, or Kubernetes Secrets mounted as files. The pattern is the same in every case: fetch at startup, cache in memory, refresh on a TTL, and surface failures loudly.

The HashiCorp Vault pattern uses short-lived database credentials issued by Vault's database secret engine. The service authenticates via Kubernetes auth, requests a lease, and renews before expiry. If renewal fails the service shuts down rather than serve traffic with stale creds — fail closed, never silently degrade.

package secrets
 
import (
    "context"
    "fmt"
    "log/slog"
    "time"
 
    vault "github.com/hashicorp/vault/api"
    auth "github.com/hashicorp/vault/api/auth/kubernetes"
)
 
type VaultClient struct {
    client *vault.Client
    role   string
    path   string
}
 
func NewVaultClient(addr, role, k8sPath string) (*VaultClient, error) {
    cfg := vault.DefaultConfig()
    cfg.Address = addr
    client, err := vault.NewClient(cfg)
    if err != nil {
        return nil, fmt.Errorf("vault client: %w", err)
    }
 
    k8sAuth, err := auth.NewKubernetesAuth(role,
        auth.WithServiceAccountTokenPath("/var/run/secrets/kubernetes.io/serviceaccount/token"))
    if err != nil {
        return nil, fmt.Errorf("k8s auth: %w", err)
    }
 
    if _, err := client.Auth().Login(context.Background(), k8sAuth); err != nil {
        return nil, fmt.Errorf("vault login: %w", err)
    }
    return &VaultClient{client: client, role: role, path: k8sPath}, nil
}
 
func (vc *VaultClient) DatabaseCreds(ctx context.Context) (user, pass string, leaseTTL time.Duration, err error) {
    secret, err := vc.client.Logical().ReadWithContext(ctx, "database/creds/"+vc.role)
    if err != nil {
        return "", "", 0, fmt.Errorf("read creds: %w", err)
    }
    user = secret.Data["username"].(string)
    pass = secret.Data["password"].(string)
    leaseTTL = time.Duration(secret.LeaseDuration) * time.Second
    return user, pass, leaseTTL, nil
}

The renewal loop runs in a background goroutine. Renew at 70% of the lease TTL — early enough to absorb network blips, late enough to avoid hammering Vault. If the renewal fails twice in a row, log loudly and trigger a graceful shutdown so Kubernetes can replace the pod with a fresh lease. ^{[Kubernetes docs]}

For AWS environments, Secrets Manager covers the same use case with IAM authentication. The Go SDK v2 client caches the secret value after the first call; configure the cache TTL to match your rotation cadence. Never call GetSecretValue on every request — you will hit throttling and add latency to every database query.

package secrets
 
import (
    "context"
    "encoding/json"
    "sync"
    "time"
 
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/service/secretsmanager"
)
 
type DBCreds struct {
    Username string `json:"username"`
    Password string `json:"password"`
    Host     string `json:"host"`
}
 
type CachedSecrets struct {
    mu      sync.RWMutex
    client  *secretsmanager.Client
    secrets map[string]cached
    ttl     time.Duration
}
 
type cached struct {
    creds   DBCreds
    fetched time.Time
}
 
func NewCachedSecrets(ctx context.Context, ttl time.Duration) (*CachedSecrets, error) {
    cfg, err := config.LoadDefaultConfig(ctx)
    if err != nil {
        return nil, err
    }
    return &CachedSecrets{
        client:  secretsmanager.NewFromConfig(cfg),
        secrets: make(map[string]cached),
        ttl:     ttl,
    }, nil
}
 
func (cs *CachedSecrets) Get(ctx context.Context, id string) (DBCreds, error) {
    cs.mu.RLock()
    if c, ok := cs.secrets[id]; ok && time.Since(c.fetched) < cs.ttl {
        cs.mu.RUnlock()
        return c.creds, nil
    }
    cs.mu.RUnlock()
 
    out, err := cs.client.GetSecretValue(ctx, &secretsmanager.GetSecretValueInput{SecretId: &id})
    if err != nil {
        return DBCreds{}, err
    }
    var creds DBCreds
    if err := json.Unmarshal([]byte(*out.SecretString), &creds); err != nil {
        return DBCreds{}, err
    }
    cs.mu.Lock()
    cs.secrets[id] = cached{creds: creds, fetched: time.Now()}
    cs.mu.Unlock()
    return creds, nil
}

For Kubernetes-native deployments without an external secrets store, the projected service account token plus a Secret mounted as a file gives you the same fail-closed shape with zero extra infrastructure. The catch: rotation requires the kubelet to remount the volume, which can take up to 60 seconds — fine for most rotations, too slow for emergency credential revocation. For incident response, pair file mounts with a sidecar that polls Vault directly.

Match the backend to your blast radius. Vault for multi-cloud and short-lived dynamic credentials. AWS Secrets Manager when you are already locked into one cloud and need IAM-based access policies. Kubernetes Secrets for low-stakes services where the cluster is the trust boundary anyway.

Configuration Migration Patterns

Configs are not append-only. Fields get renamed, defaults shift, nested objects flatten, and units change from seconds to milliseconds. Every time you ship a breaking config change, somewhere a deploy fails because production still has the old YAML. The discipline that prevents this is treating the config schema like an API: version it, support both shapes during the transition, and remove the old one only after every environment has migrated.

The first lever is a version field at the top of the config struct. The loader inspects the version, runs migrations forward to the current schema, and surfaces a deprecation warning when an old version is detected. The migration ladder is a sequence of pure functions, one per version bump.

package config
 
import (
    "fmt"
    "log/slog"
    "time"
)
 
type RawConfig struct {
    Version int                    `mapstructure:"version"`
    Raw     map[string]interface{} `mapstructure:",remain"`
}
 
type migration func(map[string]interface{}) (map[string]interface{}, error)
 
var migrations = []migration{
    nil,        // v0 — never used
    migrateV1ToV2,
    migrateV2ToV3,
}
 
const currentVersion = 3
 
func Migrate(raw RawConfig) (map[string]interface{}, error) {
    if raw.Version > currentVersion {
        return nil, fmt.Errorf("config version %d newer than supported %d — upgrade the binary",
            raw.Version, currentVersion)
    }
    data := raw.Raw
    for v := raw.Version; v < currentVersion; v++ {
        m := migrations[v] // migrations[v] takes v -> v+1 (index 1 = migrateV1ToV2)
        if m == nil {
            return nil, fmt.Errorf("no migration defined for v%d -> v%d", v, v+1)
        }
        next, err := m(data)
        if err != nil {
            return nil, fmt.Errorf("migrate v%d -> v%d: %w", v, v+1, err)
        }
        slog.Warn("config migrated", "from", v, "to", v+1)
        data = next
    }
    return data, nil
}
 
// migrateV1ToV2 flattened database.timeout.{read,write} into two top-level fields.
func migrateV1ToV2(in map[string]interface{}) (map[string]interface{}, error) {
    db, ok := in["database"].(map[string]interface{})
    if !ok {
        return in, nil
    }
    timeout, ok := db["timeout"].(map[string]interface{})
    if !ok {
        return in, nil
    }
    db["read_timeout"] = timeout["read"]
    db["write_timeout"] = timeout["write"]
    delete(db, "timeout")
    return in, nil
}
 
// migrateV2ToV3 changed cache_ttl from int seconds to a duration string.
func migrateV2ToV3(in map[string]interface{}) (map[string]interface{}, error) {
    feat, ok := in["features"].(map[string]interface{})
    if !ok {
        return in, nil
    }
    if secs, ok := feat["cache_ttl"].(int); ok {
        feat["cache_ttl"] = (time.Duration(secs) * time.Second).String()
    }
    return in, nil
}

The second lever is dual-read during deprecation windows. When you rename database.host to database.primary_host, the loader reads both — preferring the new name, falling back to the old, and emitting a structured warning that operators can grep in logs. Keep the dual-read for at least one full release cycle, ideally two. Remove the alias only after metrics confirm zero environments still use the old field.

A deprecated registry inside the config package centralises the alias mappings. Every alias has an introduction date and a removal target — when CI sees an alias whose removal date has passed, the build fails. That keeps deprecation cleanup from rotting in the backlog forever.

The third lever is a config schema test that validates the on-disk YAML examples (config.example.yaml, config.production.yaml) load cleanly under the current loader. Run it in CI. The test catches schema drift the same release the drift is introduced — not three weeks later when ops rolls out the change.

Hot Reload Safely

Hot reload is the single feature most often overused. The right mental model is a strict allowlist of fields safe to change at runtime — the rest require a full restart so connections, caches, and goroutines tear down cleanly. Reload the wrong field and you ship a service that lies about its configuration: the struct says one thing, the running code is still bound to the old value.

Safe to reload:

Feature flags (boolean toggles read on every request)
Sample rates and rate-limit thresholds
Timeouts and retry budgets that flow through context.WithTimeout
Log level and log format
Cache TTLs read at write-time

Unsafe to reload — restart instead:

Database connection strings, pool sizes, TLS configuration
HTTP listener addresses, ports, server timeouts already wrapped in http.Server
Cryptographic keys bound to long-lived signers
gRPC client targets, mTLS certificates loaded into transport credentials
Anything that touches a goroutine pool sized at startup

The pattern that makes this safe is a Reloadable interface around just the safe subset. The full config is loaded once at startup; only the reloadable subset is swapped on SIGHUP (or fsnotify event). Readers fetch the current snapshot atomically. Writers validate, then publish.

package config
 
import (
    "context"
    "fmt"
    "log/slog"
    "os"
    "os/signal"
    "sync/atomic"
    "syscall"
)
 
type Reloadable struct {
    LogLevel        string
    SampleRate      float64
    FeatureFlags    map[string]bool
    RateLimitPerSec int
}
 
type ReloadStore struct {
    current atomic.Pointer[Reloadable]
    load    func() (*Reloadable, error)
}
 
func NewReloadStore(load func() (*Reloadable, error)) (*ReloadStore, error) {
    initial, err := load()
    if err != nil {
        return nil, fmt.Errorf("initial load: %w", err)
    }
    rs := &ReloadStore{load: load}
    rs.current.Store(initial)
    return rs, nil
}
 
func (rs *ReloadStore) Get() *Reloadable {
    return rs.current.Load()
}
 
func (rs *ReloadStore) reload() {
    next, err := rs.load()
    if err != nil {
        slog.Error("reload failed — keeping previous", "error", err)
        return
    }
    if err := validateReloadable(next); err != nil {
        slog.Error("reload validation failed — keeping previous", "error", err)
        return
    }
    rs.current.Store(next)
    slog.Info("config reloaded",
        "log_level", next.LogLevel,
        "sample_rate", next.SampleRate,
        "rate_limit", next.RateLimitPerSec)
}
 
func (rs *ReloadStore) WatchSignals(ctx context.Context) {
    sig := make(chan os.Signal, 1)
    signal.Notify(sig, syscall.SIGHUP)
    defer signal.Stop(sig)
 
    for {
        select {
        case <-sig:
            rs.reload()
        case <-ctx.Done():
            return
        }
    }
}
 
func validateReloadable(r *Reloadable) error {
    switch r.LogLevel {
    case "debug", "info", "warn", "error":
    default:
        return fmt.Errorf("invalid log_level %q", r.LogLevel)
    }
    if r.SampleRate < 0 || r.SampleRate > 1 {
        return fmt.Errorf("sample_rate %v out of [0,1]", r.SampleRate)
    }
    if r.RateLimitPerSec < 1 {
        return fmt.Errorf("rate_limit_per_sec must be >= 1")
    }
    return nil
}

atomic.Pointer gives lock-free reads — request handlers call store.Get() once at the top of the handler and use that snapshot for the entire request. That avoids the half-old-half-new bug where a reload happens mid-request and downstream code sees an inconsistent mix.

SIGHUP is the canonical reload signal in Unix tradition and pairs cleanly with kubectl exec -- kill -HUP 1 for emergency reloads when you cannot wait for the fsnotify path. Pair it with the file watcher from the earlier section — both feed the same reload() method, so signal-driven and file-driven reloads share one code path and one validation gate.

The non-reloadable subset stays in a separate, immutable struct loaded once at main(). If operators want to change those fields, they update the ConfigMap and let the rolling deploy cycle pods. Trying to hot-reload a database pool size is the same shape of bug as live-patching a running JVM — it works in the demo, it falls over in production.

Frequently Asked Questions

What is Viper's configuration precedence order in Go?

Viper resolves values in this priority order (highest to lowest): explicit Set calls, CLI flags, environment variables, config files, key/value stores, and default values. Higher-priority sources silently override lower ones.

How do you hot-reload configuration in Go with Viper?

Call viper.WatchConfig() with fsnotify to watch the config file for changes. Use viper.OnConfigChange() to register a callback that re-validates and atomically swaps the config struct when the file is modified.

Should I use Viper or koanf for Go configuration?