Go Configuration Mastery: Production Patterns with Viper
The classic Viper-config-drift incident. A team needs to flip a feature flag during peak traffic — say, disable a heavy ML model during a sale. They update the Kubernetes ConfigMap. Nothing happens. The service uses Viper but nobody called
viper.WatchConfig(). By the time the deployment rolls 200 pods, the window has closed and the rollback is more expensive than the original problem. We've debugged variants of this on multiple Go services.
Configuration drift costs. A missed WatchConfig() call. An env var silently overriding a database host. A rotated secret that the service still doesn't know about. Viper solves these problems — but only if you set it up correctly.
This guide covers the production pattern: typed config structs with validation, the Viper precedence hierarchy, Kubernetes integration, live reload for feature flags, and zero-restart secrets rotation.
Use a typed config struct with go-playground/validator validation, set explicit precedence via SetEnvPrefix and SetEnvKeyReplacer, watch ConfigMaps for changes with WatchConfig + OnConfigChange, and validate new config before swapping.
- Unmarshal config into a struct; never scatter
viper.GetString()calls - Test config loading with table-driven tests covering env var overrides and invalid values
- Rotate secrets by watching mounted files, not environment variables
graph TD
Defaults[Defaults<br/>SetDefault] -->|lowest priority| Resolve{Viper.Get}
KV[K/V store<br/>etcd / Consul] -->|↑| Resolve
File[Config file<br/>config.yaml] -->|↑| Resolve
Env[Env vars<br/>SetEnvPrefix] -->|↑| Resolve
Flag[CLI flags<br/>BindPFlag] -->|↑| Resolve
Set[Explicit Set] -->|highest| Resolve
Resolve --> Cfg[Typed Config struct<br/>+ validator tags]
Cfg -->|fail closed| Boot[Service starts]
Watch[fsnotify on file] -.->|change| OnChange[OnConfigChange]
OnChange -.->|re-unmarshal + revalidate| Cfg
style Set fill:#fee
style Cfg fill:#efe
style Boot fill:#efe
The diagram is the precedence ladder + the hot-reload loop. The "fail closed" arrow is the discipline: if validation fails after a reload, keep the old config rather than swap in something invalid. That's the bug most teams miss. [Viper docs]
When to Use Viper
[Viper docs]Viper is right for services with hierarchical, multi-source config (files, env vars, flags, remote stores). Use it when:
- Config spans files, environment variables, and flags
- You need a typed struct with nested fields
- Hot reload matters (feature flags, timeouts)
- You deploy to Kubernetes with ConfigMaps and Secrets
Pick the right tool by counting the sources you actually need:
| Tool | Sources | Hot reload | Best for |
|---|---|---|---|
os.Getenv | env vars only | No | Under 10 flat values; quick scripts; no struct |
envconfig | env vars + struct tags | No | Single-source services with typed config |
koanf | files / env / KV / flags | Yes (per-source) | New projects wanting a modern modular API |
| Viper | files / env / flags / KV stores | Yes via WatchConfig | Multi-source; hot-reload; large existing Go ecosystems |
spf13/cobra + viper | + CLI subcommand binding | Yes | CLI tools that also need structured config files |
Use os.Getenv for < 10 flat config values. Use koanf for new projects wanting a modern API. For existing Viper codebases, stay: v1 is stable and widely deployed.
Precedence Hierarchy
Viper resolves in this order (highest to lowest): explicit Set() calls, CLI flags, environment variables, config files, key/value stores, defaults[Viper docs].
graph TD
Code[viper.Get database.host] --> P1{Was Set explicitly?}
P1 -->|Yes| Win1[Return explicit value]
P1 -->|No| P2{Bound to flag<br/>and flag passed?}
P2 -->|Yes| Win2[Return flag value]
P2 -->|No| P3{Env var set?<br/>APP_DATABASE_HOST}
P3 -->|Yes| Win3[Return env value]
P3 -->|No| P4{Config file<br/>has key?}
P4 -->|Yes| Win4[Return file value]
P4 -->|No| P5{Remote KV store?<br/>etcd, consul}
P5 -->|Yes| Win5[Return remote value]
P5 -->|No| P6{SetDefault called?}
P6 -->|Yes| Win6[Return default]
P6 -->|No| Zero[Zero value]
style Win3 fill:#dfd
style Win4 fill:#dfd
style Win6 fill:#ffd
style Zero fill:#fdd
The diagram is the Kubernetes deploy story: image-baked YAML provides the safe defaults, ConfigMaps mount as env vars to override per environment, and Secrets mount as env vars to inject credentials at the highest precedence. You never rebuild just to change non-secret config.
Key gotcha: Viper lowercases all keys internally. APP_DATABASE_HOST and app_database_host collide. Use SetEnvPrefix("APP") and SetEnvKeyReplacer(strings.NewReplacer(".", "_")) consistently to avoid silent overrides.
Typed Config Struct Pattern
Never scatter viper.GetString("database.host") throughout your codebase. Unmarshal into a typed struct at startup:
package config
import (
"fmt"
"log/slog"
"strings"
"time"
"github.com/go-playground/validator/v10"
"github.com/spf13/viper"
)
type Config struct {
Service ServiceConfig `mapstructure:"service"`
Database DatabaseConfig `mapstructure:"database"`
Features FeatureConfig `mapstructure:"features"`
Monitoring MonitoringConfig `mapstructure:"monitoring"`
}
type ServiceConfig struct {
Name string `mapstructure:"name" validate:"required"`
Version string `mapstructure:"version" validate:"required,semver"`
Port int `mapstructure:"port" validate:"required,gte=1024,lte=65535"`
Timeout time.Duration `mapstructure:"timeout" validate:"required,min=1s,max=60s"`
}
type DatabaseConfig struct {
Host string `mapstructure:"host" validate:"required,hostname|ip"`
Port int `mapstructure:"port" validate:"required,gte=1,lte=65535"`
User string `mapstructure:"user" validate:"required"`
Password string `mapstructure:"password" validate:"required,min=8"`
Name string `mapstructure:"name" validate:"required"`
SSLMode string `mapstructure:"ssl_mode" validate:"oneof=disable require verify-ca verify-full"`
Pool struct {
MaxConnections int `mapstructure:"max_connections" validate:"gte=1,lte=500"`
IdleTimeout time.Duration `mapstructure:"idle_timeout"`
} `mapstructure:"pool"`
}
type FeatureConfig struct {
NewRecommendationEngine bool `mapstructure:"new_recommendation_engine"`
ABTestingEnabled bool `mapstructure:"ab_testing_enabled"`
CacheTTL time.Duration `mapstructure:"cache_ttl"`
}
type MonitoringConfig struct {
MetricsEnabled bool `mapstructure:"metrics_enabled"`
TracingSampleRate float64 `mapstructure:"tracing_sample_rate" validate:"gte=0,lte=1"`
HealthCheckInterval time.Duration `mapstructure:"health_check_interval"`
}func Load(configPath string) (*Config, error) {
v := viper.New()
setDefaults(v)
v.SetConfigName("config")
v.SetConfigType("yaml")
v.AddConfigPath(configPath)
v.AddConfigPath(".")
v.AutomaticEnv()
v.SetEnvPrefix("APP")
v.SetEnvKeyReplacer(strings.NewReplacer(".", "_"))
// AutomaticEnv only overrides keys Viper already knows from a default, the
// config file, or a flag. Keys that arrive *only* via env (DB credentials,
// name) are invisible to Unmarshal unless bound explicitly — otherwise they
// silently stay empty. See spf13/viper#761.
for _, key := range []string{"database.user", "database.password", "database.name"} {
_ = v.BindEnv(key)
}
if err := v.ReadInConfig(); err != nil {
if _, ok := err.(viper.ConfigFileNotFoundError); !ok {
return nil, fmt.Errorf("error reading config file: %w", err)
}
// Config file is optional — environment variables and defaults are sufficient
}
var cfg Config
if err := v.Unmarshal(&cfg); err != nil {
return nil, fmt.Errorf("unable to decode config: %w", err)
}
if err := validate(cfg); err != nil {
return nil, fmt.Errorf("config validation failed: %w", err)
}
return &cfg, nil
}
func setDefaults(v *viper.Viper) {
v.SetDefault("service.port", 8080)
v.SetDefault("service.timeout", "30s")
v.SetDefault("database.port", 5432)
v.SetDefault("database.ssl_mode", "require")
v.SetDefault("database.pool.max_connections", 25)
v.SetDefault("database.pool.idle_timeout", "10m")
v.SetDefault("features.cache_ttl", "5m")
v.SetDefault("monitoring.metrics_enabled", true)
v.SetDefault("monitoring.tracing_sample_rate", 0.1)
v.SetDefault("monitoring.health_check_interval", "30s")
}Validate Early, Fail Fast
Use go-playground/validator to validate the struct after Unmarshal. Run field-level validation (required, gte, oneof tags) plus cross-field rules (e.g., if A/B testing enabled, recommendation engine must be enabled). Validation failure blocks startup — far better than discovering invalid config under load.
func validate(cfg Config) error {
v := validator.New()
if err := v.Struct(cfg); err != nil {
return err
}
// Cross-field: if A/B testing enabled, recommendation engine must be too
if cfg.Features.ABTestingEnabled && !cfg.Features.NewRecommendationEngine {
return fmt.Errorf("ab_testing requires new_recommendation_engine")
}
return nil
}Log Resolved Config (Redacted)
Log what config the service actually loaded at startup so operators can verify it during incidents. Critical rule: redact all secrets.
func LogConfig(cfg *Config) {
redacted := *cfg
redacted.Database.Password = "***REDACTED***"
slog.Info("config loaded",
"service", redacted.Service.Name,
"port", redacted.Service.Port,
"db.host", redacted.Database.Host,
"db.ssl_mode", redacted.Database.SSLMode,
"source", viper.ConfigFileUsed(),
)
}For many secrets, automate redaction with struct tags (log:"redact") and reflection — never rely on manual care.
Config Files and Kubernetes Integration
[Kubernetes docs]Store YAML defaults in the image; override with environment variables from Kubernetes ConfigMaps (non-secrets) and Secrets:
# config.yaml — baked into image
service:
name: "user-service"
port: 8080
database:
host: "localhost"
port: 5432
ssl_mode: "require"
# user, password: never in YAML — from APP_DATABASE_USER env var
features:
new_recommendation_engine: falseIn Kubernetes, ConfigMaps override file defaults; Secrets override both:
env:
# From ConfigMap
- name: APP_DATABASE_HOST
valueFrom:
configMapKeyRef:
name: database-config
key: host
# From Secret
- name: APP_DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: passwordNever put passwords, API keys, or TLS certs in YAML files.
Live Reload for Feature Flags
[Viper docs]Watch config files for changes with WatchConfig() + OnConfigChange(). Validate new config before swapping:
type Manager struct {
mu sync.RWMutex
config Config
}
v.WatchConfig()
v.OnConfigChange(func(e fsnotify.Event) {
var newCfg Config
v.Unmarshal(&newCfg)
if err := validate(newCfg); err != nil {
slog.Error("reload validation failed — keeping previous", "error", err)
return
}
m.mu.Lock()
m.config = newCfg
m.mu.Unlock()
slog.Info("config reloaded")
})For read-heavy services, use sync/atomic.Value instead of sync.RWMutex — lock-free reads via hardware CAS. Valid only for feature flags, sample rates, timeouts. Never live-reload database credentials or TLS certs — use file-based secrets rotation (below) instead.
Secrets Rotation Without Restart
Kubernetes Secrets as environment variables don't update after the pod starts. Mount secrets as files instead, watch them with fsnotify[Kubernetes docs], and reconnect when rotation occurs:
func Watch(ctx context.Context, secretPath string, pool *pgxpool.Pool) {
watcher, _ := fsnotify.NewWatcher()
watcher.Add(secretPath)
for {
select {
case event := <-watcher.Events:
if event.Op&fsnotify.Write != 0 {
if err := rotate(ctx, secretPath, pool); err != nil {
slog.Error("rotate failed", "error", err)
}
}
case <-ctx.Done():
return
}
}
}
func rotate(ctx context.Context, secretPath string, old *pgxpool.Pool) error {
creds := loadCredentials(secretPath)
newPool, _ := pgxpool.New(ctx, creds.ConnString())
newPool.Ping(ctx) // Verify it works
// Swap pools atomically, let old drain
time.Sleep(5 * time.Second)
old.Close()
return nil
}Mount the secret as a file volume:
volumes:
- name: db-secret
secret:
secretName: database-secret
containers:
- volumeMounts:
- name: db-secret
mountPath: /run/secrets/dbTesting Config Loading
Use table-driven tests to catch invalid configs before production:
tests := []struct {
name string
yaml string
envVars map[string]string
wantErr string
}{
{
name: "valid config",
yaml: validYAML,
envVars: map[string]string{"APP_DATABASE_PASSWORD": "secure123"},
},
{
name: "missing password",
yaml: validYAML,
wantErr: "Password is required",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
dir := t.TempDir()
os.WriteFile(filepath.Join(dir, "config.yaml"), []byte(tt.yaml), 0644)
for k, v := range tt.envVars {
t.Setenv(k, v)
}
cfg, err := config.Load(dir)
if tt.wantErr != "" && !strings.Contains(err.Error(), tt.wantErr) {
t.Fatalf("expected error %q, got %v", tt.wantErr, err)
}
})
}Use t.Setenv (auto-restore) + t.TempDir for isolation. Run tests with -race to catch concurrent access bugs.
Common Pitfalls
- Silent key collisions: Viper lowercases all keys.
APP_API_KEYandapp_api_keycollide. UseSetEnvPrefix("APP")consistently. - Boolean ambiguity: Env vars are strings. Viper treats
"true","1","t"as true, but empty string""silently becomes false. Validate with go-playground/validator. - Init-function antipattern: Never load config in
init()— makes testing impossible. Pass config as a dependency. - Secret leakage: If validation fails on a struct with DB credentials, error messages may dump them. Redact connection strings before logging.
- Scattered
GetStringcalls: Global state smell. Always unmarshal to a struct and pass it around. - Debugging precedence: Use
viper.Debug()to print resolution order — invaluable when env vars silently override files.
Production Checklist
- Config unmarshalled into typed struct — no scattered
viper.GetStringcalls - Required fields + cross-field rules validated with go-playground/validator
- Secrets in environment variables or file mounts — never in YAML
- Startup log prints resolved config with secrets redacted
-
SetEnvPrefix("APP")+SetEnvKeyReplacer(strings.NewReplacer(".", "_"))set - Live reload validates config before swapping; invalid reloads log error and keep previous
- File-based secrets with fsnotify watcher + pool-swap rotation (not env vars)
- Table-driven tests: missing fields, out-of-range, env var overrides, cross-field rules
-
viper.Debug()available for troubleshooting precedence in staging
Secrets Management Beyond Environment Variables
Environment variables are the floor, not the ceiling. They leak through /proc/[pid]/environ, they show up in crash dumps, they require a pod restart to rotate, and they get baked into stack traces. For anything beyond a side project, route credentials through a secrets backend — HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, or Kubernetes Secrets mounted as files. The pattern is the same in every case: fetch at startup, cache in memory, refresh on a TTL, and surface failures loudly.
The HashiCorp Vault pattern uses short-lived database credentials issued by Vault's database secret engine. The service authenticates via Kubernetes auth, requests a lease, and renews before expiry. If renewal fails the service shuts down rather than serve traffic with stale creds — fail closed, never silently degrade.
package secrets
import (
"context"
"fmt"
"log/slog"
"time"
vault "github.com/hashicorp/vault/api"
auth "github.com/hashicorp/vault/api/auth/kubernetes"
)
type VaultClient struct {
client *vault.Client
role string
path string
}
func NewVaultClient(addr, role, k8sPath string) (*VaultClient, error) {
cfg := vault.DefaultConfig()
cfg.Address = addr
client, err := vault.NewClient(cfg)
if err != nil {
return nil, fmt.Errorf("vault client: %w", err)
}
k8sAuth, err := auth.NewKubernetesAuth(role,
auth.WithServiceAccountTokenPath("/var/run/secrets/kubernetes.io/serviceaccount/token"))
if err != nil {
return nil, fmt.Errorf("k8s auth: %w", err)
}
if _, err := client.Auth().Login(context.Background(), k8sAuth); err != nil {
return nil, fmt.Errorf("vault login: %w", err)
}
return &VaultClient{client: client, role: role, path: k8sPath}, nil
}
func (vc *VaultClient) DatabaseCreds(ctx context.Context) (user, pass string, leaseTTL time.Duration, err error) {
secret, err := vc.client.Logical().ReadWithContext(ctx, "database/creds/"+vc.role)
if err != nil {
return "", "", 0, fmt.Errorf("read creds: %w", err)
}
user = secret.Data["username"].(string)
pass = secret.Data["password"].(string)
leaseTTL = time.Duration(secret.LeaseDuration) * time.Second
return user, pass, leaseTTL, nil
}The renewal loop runs in a background goroutine. Renew at 70% of the lease TTL — early enough to absorb network blips, late enough to avoid hammering Vault. If the renewal fails twice in a row, log loudly and trigger a graceful shutdown so Kubernetes can replace the pod with a fresh lease. [Kubernetes docs]
For AWS environments, Secrets Manager covers the same use case with IAM authentication. The Go SDK v2 client caches the secret value after the first call; configure the cache TTL to match your rotation cadence. Never call GetSecretValue on every request — you will hit throttling and add latency to every database query.
package secrets
import (
"context"
"encoding/json"
"sync"
"time"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/secretsmanager"
)
type DBCreds struct {
Username string `json:"username"`
Password string `json:"password"`
Host string `json:"host"`
}
type CachedSecrets struct {
mu sync.RWMutex
client *secretsmanager.Client
secrets map[string]cached
ttl time.Duration
}
type cached struct {
creds DBCreds
fetched time.Time
}
func NewCachedSecrets(ctx context.Context, ttl time.Duration) (*CachedSecrets, error) {
cfg, err := config.LoadDefaultConfig(ctx)
if err != nil {
return nil, err
}
return &CachedSecrets{
client: secretsmanager.NewFromConfig(cfg),
secrets: make(map[string]cached),
ttl: ttl,
}, nil
}
func (cs *CachedSecrets) Get(ctx context.Context, id string) (DBCreds, error) {
cs.mu.RLock()
if c, ok := cs.secrets[id]; ok && time.Since(c.fetched) < cs.ttl {
cs.mu.RUnlock()
return c.creds, nil
}
cs.mu.RUnlock()
out, err := cs.client.GetSecretValue(ctx, &secretsmanager.GetSecretValueInput{SecretId: &id})
if err != nil {
return DBCreds{}, err
}
var creds DBCreds
if err := json.Unmarshal([]byte(*out.SecretString), &creds); err != nil {
return DBCreds{}, err
}
cs.mu.Lock()
cs.secrets[id] = cached{creds: creds, fetched: time.Now()}
cs.mu.Unlock()
return creds, nil
}For Kubernetes-native deployments without an external secrets store, the projected service account token plus a Secret mounted as a file gives you the same fail-closed shape with zero extra infrastructure. The catch: rotation requires the kubelet to remount the volume, which can take up to 60 seconds — fine for most rotations, too slow for emergency credential revocation. For incident response, pair file mounts with a sidecar that polls Vault directly.
Match the backend to your blast radius. Vault for multi-cloud and short-lived dynamic credentials. AWS Secrets Manager when you are already locked into one cloud and need IAM-based access policies. Kubernetes Secrets for low-stakes services where the cluster is the trust boundary anyway.
Configuration Migration Patterns
Configs are not append-only. Fields get renamed, defaults shift, nested objects flatten, and units change from seconds to milliseconds. Every time you ship a breaking config change, somewhere a deploy fails because production still has the old YAML. The discipline that prevents this is treating the config schema like an API: version it, support both shapes during the transition, and remove the old one only after every environment has migrated.
The first lever is a version field at the top of the config struct. The loader inspects the version, runs migrations forward to the current schema, and surfaces a deprecation warning when an old version is detected. The migration ladder is a sequence of pure functions, one per version bump.
package config
import (
"fmt"
"log/slog"
"time"
)
type RawConfig struct {
Version int `mapstructure:"version"`
Raw map[string]interface{} `mapstructure:",remain"`
}
type migration func(map[string]interface{}) (map[string]interface{}, error)
var migrations = []migration{
nil, // v0 — never used
migrateV1ToV2,
migrateV2ToV3,
}
const currentVersion = 3
func Migrate(raw RawConfig) (map[string]interface{}, error) {
if raw.Version > currentVersion {
return nil, fmt.Errorf("config version %d newer than supported %d — upgrade the binary",
raw.Version, currentVersion)
}
data := raw.Raw
for v := raw.Version; v < currentVersion; v++ {
m := migrations[v+1]
if m == nil {
return nil, fmt.Errorf("no migration defined for v%d -> v%d", v, v+1)
}
next, err := m(data)
if err != nil {
return nil, fmt.Errorf("migrate v%d -> v%d: %w", v, v+1, err)
}
slog.Warn("config migrated", "from", v, "to", v+1)
data = next
}
return data, nil
}
// migrateV1ToV2 flattened database.timeout.{read,write} into two top-level fields.
func migrateV1ToV2(in map[string]interface{}) (map[string]interface{}, error) {
db, ok := in["database"].(map[string]interface{})
if !ok {
return in, nil
}
timeout, ok := db["timeout"].(map[string]interface{})
if !ok {
return in, nil
}
db["read_timeout"] = timeout["read"]
db["write_timeout"] = timeout["write"]
delete(db, "timeout")
return in, nil
}
// migrateV2ToV3 changed cache_ttl from int seconds to a duration string.
func migrateV2ToV3(in map[string]interface{}) (map[string]interface{}, error) {
feat, ok := in["features"].(map[string]interface{})
if !ok {
return in, nil
}
if secs, ok := feat["cache_ttl"].(int); ok {
feat["cache_ttl"] = (time.Duration(secs) * time.Second).String()
}
return in, nil
}The second lever is dual-read during deprecation windows. When you rename database.host to database.primary_host, the loader reads both — preferring the new name, falling back to the old, and emitting a structured warning that operators can grep in logs. Keep the dual-read for at least one full release cycle, ideally two. Remove the alias only after metrics confirm zero environments still use the old field.
A deprecated registry inside the config package centralises the alias mappings. Every alias has an introduction date and a removal target — when CI sees an alias whose removal date has passed, the build fails. That keeps deprecation cleanup from rotting in the backlog forever.
The third lever is a config schema test that validates the on-disk YAML examples (config.example.yaml, config.production.yaml) load cleanly under the current loader. Run it in CI. The test catches schema drift the same release the drift is introduced — not three weeks later when ops rolls out the change.
Hot Reload Safely
Hot reload is the single feature most often overused. The right mental model is a strict allowlist of fields safe to change at runtime — the rest require a full restart so connections, caches, and goroutines tear down cleanly. Reload the wrong field and you ship a service that lies about its configuration: the struct says one thing, the running code is still bound to the old value.
Safe to reload:
- Feature flags (boolean toggles read on every request)
- Sample rates and rate-limit thresholds
- Timeouts and retry budgets that flow through
context.WithTimeout - Log level and log format
- Cache TTLs read at write-time
Unsafe to reload — restart instead:
- Database connection strings, pool sizes, TLS configuration
- HTTP listener addresses, ports, server timeouts already wrapped in
http.Server - Cryptographic keys bound to long-lived signers
- gRPC client targets, mTLS certificates loaded into transport credentials
- Anything that touches a goroutine pool sized at startup
The pattern that makes this safe is a Reloadable interface around just the safe subset. The full config is loaded once at startup; only the reloadable subset is swapped on SIGHUP (or fsnotify event). Readers fetch the current snapshot atomically. Writers validate, then publish.
package config
import (
"context"
"fmt"
"log/slog"
"os"
"os/signal"
"sync/atomic"
"syscall"
)
type Reloadable struct {
LogLevel string
SampleRate float64
FeatureFlags map[string]bool
RateLimitPerSec int
}
type ReloadStore struct {
current atomic.Pointer[Reloadable]
load func() (*Reloadable, error)
}
func NewReloadStore(load func() (*Reloadable, error)) (*ReloadStore, error) {
initial, err := load()
if err != nil {
return nil, fmt.Errorf("initial load: %w", err)
}
rs := &ReloadStore{load: load}
rs.current.Store(initial)
return rs, nil
}
func (rs *ReloadStore) Get() *Reloadable {
return rs.current.Load()
}
func (rs *ReloadStore) reload() {
next, err := rs.load()
if err != nil {
slog.Error("reload failed — keeping previous", "error", err)
return
}
if err := validateReloadable(next); err != nil {
slog.Error("reload validation failed — keeping previous", "error", err)
return
}
rs.current.Store(next)
slog.Info("config reloaded",
"log_level", next.LogLevel,
"sample_rate", next.SampleRate,
"rate_limit", next.RateLimitPerSec)
}
func (rs *ReloadStore) WatchSignals(ctx context.Context) {
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGHUP)
defer signal.Stop(sig)
for {
select {
case <-sig:
rs.reload()
case <-ctx.Done():
return
}
}
}
func validateReloadable(r *Reloadable) error {
switch r.LogLevel {
case "debug", "info", "warn", "error":
default:
return fmt.Errorf("invalid log_level %q", r.LogLevel)
}
if r.SampleRate < 0 || r.SampleRate > 1 {
return fmt.Errorf("sample_rate %v out of [0,1]", r.SampleRate)
}
if r.RateLimitPerSec < 1 {
return fmt.Errorf("rate_limit_per_sec must be >= 1")
}
return nil
}atomic.Pointer gives lock-free reads — request handlers call store.Get() once at the top of the handler and use that snapshot for the entire request. That avoids the half-old-half-new bug where a reload happens mid-request and downstream code sees an inconsistent mix.
SIGHUP is the canonical reload signal in Unix tradition and pairs cleanly with kubectl exec -- kill -HUP 1 for emergency reloads when you cannot wait for the fsnotify path. Pair it with the file watcher from the earlier section — both feed the same reload() method, so signal-driven and file-driven reloads share one code path and one validation gate.
The non-reloadable subset stays in a separate, immutable struct loaded once at main(). If operators want to change those fields, they update the ConfigMap and let the rolling deploy cycle pods. Trying to hot-reload a database pool size is the same shape of bug as live-patching a running JVM — it works in the demo, it falls over in production.
Frequently Asked Questions
What is Viper's configuration precedence order in Go?
Viper resolves values in this priority order (highest to lowest): explicit Set calls, CLI flags, environment variables, config files, key/value stores, and default values. Higher-priority sources silently override lower ones.
How do you hot-reload configuration in Go with Viper?
Call viper.WatchConfig() with fsnotify to watch the config file for changes. Use viper.OnConfigChange() to register a callback that re-validates and atomically swaps the config struct when the file is modified.
Should I use Viper or koanf for Go configuration?
Use Viper for existing codebases or multi-source hierarchical config (files, env, flags, remote stores). Use koanf for new projects that want a modern API with a smaller dependency tree and similar capabilities.
How do you validate configuration in Go with Viper?
Unmarshal Viper values into a typed struct with validation tags, then run a validator (like go-playground/validator) at startup. This catches missing required fields, invalid values, and type mismatches before the service starts serving traffic.
Keep Reading
- Go Graceful HTTP Shutdown: Zero-Downtime Production Patterns — Shutdown guards that block config reloads during drain, and two-phase shutdown sequencing
- Production-Grade Go API Design — Where config gets consumed: the server setup, middleware chains, and health probes that depend on correct configuration
- Go Testing: Table-Driven Tests, Mocks, and Testcontainers — Table-driven tests for config validation,
t.Setenvfor isolated env var testing, andt.TempDirfor config file fixtures - Go Context Cheat Sheet —
context.Contextis the call-site cousin of config: long-lived defaults vs request-scoped overrides - Building Resilient Distributed Systems with Go — Config drives circuit-breaker thresholds, retry budgets, and bulkhead sizes: get config wrong and resilience policies misfire
Engineering Team
A multidisciplinary team of backend engineers, architects, and DevOps practitioners shipping deep dives into distributed systems and production infrastructure.
Read Next
Go Graceful HTTP Shutdown: Zero-Downtime Production Patterns
Go graceful shutdown: SIGTERM handling, health probe coordination, and Kubernetes drain patterns for zero dropped requests.
Go context.Context Cheat Sheet: Cancellation, Timeouts & Gotchas
Go context.Context: constructors, cancellation, deadlines, request values, and five goroutine leak patterns in production.
Go Dynamic JSON: Parsing Unknown Schemas in Production
Handle unpredictable JSON in Go: map[string]any, json.RawMessage, type switches, and defensive patterns for shifting schemas.