#caching #redis #system-design #scalability #go #cache-invalidation

Caching Strategies at Scale

BackendBytes Engineering Team

Jan 5, 2025

16 min read

Part of Series: Distributed Systems Mastery

Lesson 5 of 6

Prev Next

Key Takeaways

→Cache stampede: when TTL expires during peak traffic, 40K concurrent users all miss cache at once, sending 40K queries/sec to a database designed for 800 — singleflight deduplicates to 1 query
→Cache-aside: app handles misses, populates cache itself, simple but staleness window. Write-through: always warm but adds write latency. Write-behind: fastest writes but accepts data loss on crash
→Probabilistic early expiration or singleflight prevents stampedes by ensuring only one goroutine fetches on cache miss; others wait for the result and use it
→Multi-tier caching (L1 in-process LRU + L2 Redis) gives you local speed (microseconds) without sacrificing shareability across instances

Black Friday, 11:02 PM. Cache TTL expires. 40,000 concurrent users hit the database at once: 800 queries/sec → 40,000 queries/sec in 400ms. Database CPUs max. Connection pools exhaust. Site down in 7 minutes.

The cache had made things catastrophically worse than no cache.

This is the cache stampede: failure mode for systems that cache correctly. Expiration happens on schedule, thousands of goroutines race to refill, the database — designed for cached traffic — gets hammered instead.

Caching at scale isn't about happy-path latency (cache hit = fast). It's about the failure modes: stampedes when keys expire, stale data when invalidation misses, cold starts that spike database load, and crashes when Redis dies.

TL;DR

Use cache-aside for reads, write-through for writes. Prevent stampedes with singleflight or probabilistic early expiration. Invalidate with events and CDC, not just TTL. Multi-tier (L1 in-process + L2 Redis) eliminates the speed/shareability trade-off.

Pattern choice sets data consistency bounds — cache-aside eventual, write-through strong
Stampede defense duplicates concurrent misses — singleflight reduces 40K queries to 1
Event-driven invalidation fires on DB changes — not just timers

Quick Pattern Decision Matrix

Pattern	When to Use	Read Latency	Write Latency	Consistency	Data Loss Risk
Cache-Aside	Read-heavy, general purpose	Fast after warm	Normal	Eventual	None
Write-Through	Session/config data	Fast	Slower	Strong	None
Write-Behind	Analytics, counters	Fast	Fastest	Eventual	Yes (on crash)
Read-Through	Large working sets, complex loading	Fast after warm	Normal	Eventual	None

Choose cache-aside (most common) for read-heavy workloads. Write-through when you need cache always warm. Write-behind for write-heavy, crash-loss-tolerant data. Read-through when loading logic is complex or shared.

The Four Caching Patterns

^{[Redis Docs]}

Pattern 1: Cache-Aside (Lazy Loading)

App checks cache, handles misses by fetching from DB, populates cache itself. Simplest and most common pattern. The application is responsible for both cache interaction and database fallback.

Trade-offs: Read latency on cache miss (app waits for DB fetch + cache populate), staleness window between writes and TTL expiry, and cold-start overhead when cache is empty.

graph TD
    Client["Client Request"] --> App["Application"]
    App -->|"1. GET key"| Cache["Redis Cache"]
    Cache -->|"2a. Cache HIT"| App
    Cache -->|"2b. Cache MISS"| App
    App -->|"3. SELECT * FROM ..."| DB[(PostgreSQL)]
    DB -->|"4. Row data"| App
    App -->|"5. SET key (TTL)"| Cache
    App -->|"6. Response"| Client

func GetProduct(ctx context.Context, id string) (*Product, error) {
    cacheKey := "product:" + id
 
    // Check cache first
    cached, err := cache.Get(ctx, cacheKey)
    if err == nil {
        var product Product
        if err := json.Unmarshal(cached, &product); err == nil {
            return &product, nil
        }
    }
 
    // Cache miss — fetch from DB
    product, err := db.GetProduct(ctx, id)
    if err != nil {
        return nil, err
    }
 
    // Populate cache for next request
    data, _ := json.Marshal(product)
    cache.Set(ctx, cacheKey, data, 5*time.Minute)
    return product, nil
}

On writes, delete the cache key — next read repopulates: cache.Delete(ctx, "product:"+id). This is simple but only safe when staleness is acceptable.

Best for: Read-heavy, staleness-tolerant data (product catalogs, user profiles).

Pattern 2: Write-Through

Writes go to cache and DB synchronously. Cache is always warm. Trade-off: slower writes (the synchronous round-trip to both cache and DB adds latency) and unused cache entries accumulate. Best for mutable data that changes frequently (sessions, shopping carts, config).

Pattern 3: Write-Behind (Write-Back)

Writes go to cache immediately; background worker batches and asynchronously flushes to DB. Fastest write latency but accepts data loss on crash. Mitigate with AOF persistence. Best for write-heavy, crash-loss-tolerant workloads (analytics counters, metrics, ephemeral state).

Pattern 4: Read-Through

Cache layer transparently loads data on misses via a loader function. App only talks to cache; cache handles DB interaction. Encapsulates loading logic but couples cache to data sources and requires app-side write invalidation. Best for complex, shared loading logic.

Preventing Cache Stampedes

Cache stampedes^{[Cache stampede paper]} occur when a popular key expires at peak traffic. All concurrent requests discover the miss simultaneously and race to refill the cache — all querying the database at once. At 40,000 req/s, this means 40,000 database queries per second, overwhelming a database designed for cached traffic (800 req/s baseline).

The standard defense is request deduplication via singleflight^{[Go singleflight]}. If 10,000 goroutines simultaneously discover a cache miss for the same key, only one queries the database. The other 9,999 wait for the result and receive the same response.

sequenceDiagram
    participant R1 as Req 1
    participant R2 as Req 2..N
    participant SF as singleflight
    participant DB as Database

    Note over R1,R2: popular key expires
    R1->>SF: Do("product:42", fetchFn)
    R2->>SF: Do("product:42", fetchFn)
    Note over SF: N-1 callers wait on<br/>the in-flight Do
    SF->>DB: fetchFn() ← only ONE query
    DB-->>SF: row
    SF-->>R1: result
    SF-->>R2: same result (shared=true)

Without singleflight, N concurrent misses = N database queries. With it, N misses = 1 query + N-1 subscribers.

import "golang.org/x/sync/singleflight"
 
var group singleflight.Group
 
func GetProduct(ctx context.Context, id string) (*Product, error) {
    cacheKey := "product:" + id
 
    // Fast path: cache hit
    if val, err := cache.Get(ctx, cacheKey); err == nil {
        var product Product
        json.Unmarshal(val, &product)
        return &product, nil
    }
 
    // Deduplicate concurrent misses — only one goroutine fetches from DB
    v, err, shared := group.Do(cacheKey, func() (interface{}, error) {
        // Double-check: another goroutine may have populated while we waited
        if val, err := cache.Get(ctx, cacheKey); err == nil {
            var product Product
            json.Unmarshal(val, &product)
            return &product, nil
        }
 
        product, err := db.GetProduct(ctx, id)
        if err != nil {
            return nil, err
        }
        data, _ := json.Marshal(product)
        cache.Set(ctx, cacheKey, data, 5*time.Minute)
        return product, nil
    })
 
    if err != nil {
        return nil, err
    }
    // `shared` is true if this result was shared with other concurrent callers
    return v.(*Product), nil
}

singleflight is process-level: each API pod deduplicates its own concurrent misses. In a fleet of 100 pods, the 40,000 concurrent misses at TTL expiry (~400 per pod) collapse to 1-2 DB queries per pod (~100-200 fleet-wide) instead of 40,000 — a 200-400× reduction. This overhead is worth it on any key accessed more than a few times per second. ^{[Cache stampede paper]}

Invalidation Without Timers

^{[Apache Kafka Docs]}

TTL-based invalidation alone is dangerous when data changes frequently. Setting a 5-minute TTL on product prices means customers see stale prices for up to 5 minutes after an admin price change. For critical data, you need invalidation that fires on writes, not timers.

Event-driven invalidation: After every database write, publish an invalidation event. A subscriber (which can be the same service) deletes or updates the affected cache key.

func (s *ProductService) UpdateProduct(ctx context.Context, product *Product) error {
    // Write to DB
    if err := s.db.UpdateProduct(ctx, product); err != nil {
        return err
    }
    // Publish invalidation event — subscriber will delete from cache
    event := map[string]string{"type": "product", "id": product.ID}
    data, _ := json.Marshal(event)
    s.pubsub.Publish(ctx, "cache.invalidations", data)
    // Log but don't fail — cache will expire via TTL as fallback
    return nil
}
 
// Subscriber (runs in background)
func (s *InvalidationService) ProcessInvalidations(ctx context.Context) {
    s.pubsub.Subscribe(ctx, "cache.invalidations", func(data []byte) error {
        var event map[string]string
        json.Unmarshal(data, &event)
        cacheKey := event["type"] + ":" + event["id"]
        s.cache.Delete(ctx, cacheKey)
        return nil
    })
}

Limitation: Brief window between DB write and cache invalidation where readers see stale data. For strict consistency, couple invalidation with the DB transaction using the Outbox pattern or CDC.

CDC-based invalidation (most reliable): Tools like Debezium read the database WAL (write-ahead log) and publish every row change as an event. This catches all DB updates regardless of origin — API writes, direct SQL, migrations, batch processes.

Tag-based invalidation: Store cache keys under tags. When an entity changes, invalidate its tag to delete all related keys at once. Solves "one write, many cache keys" (e.g., updating a product invalidates detail page + inventory + category listing + search results).

Multi-Tier Caching

Single-cache-tier forces a trade-off: in-process caches (L1) are fast (~1 microsecond) but limited to one machine's memory and go cold on restart; Redis^{[Redis Docs]} (L2) is shared across servers but adds network latency (~0.5 milliseconds). Multi-tier caching eliminates this trade-off: check L1 first (fast), fall through to L2 (shared state), then to database.

L1 absorbs hot-key traffic. L2 provides shared state across instances. Together, they give you both speed and scale.

type TieredCache struct {
	l1    *lru.Cache[string, []byte]
	l2    *redis.Client
	l1TTL time.Duration
}
 
func (tc *TieredCache) Get(ctx context.Context, key string) ([]byte, error) {
	// L1: in-process (fast, but not shared)
	if val, ok := tc.l1.Get(key); ok {
		return val, nil
	}
 
	// L2: Redis (shared, but slower)
	val, err := tc.l2.Get(ctx, key).Bytes()
	if err == redis.Nil {
		return nil, ErrCacheMiss
	}
	if err != nil {
		return nil, err
	}
 
	// Promote to L1 (next request will hit L1)
	tc.l1.Add(key, val)
	return val, nil
}
 
func (tc *TieredCache) Set(ctx context.Context, key string, value []byte, l2TTL time.Duration) error {
	// Write to both tiers
	tc.l1.Add(key, value)
	return tc.l2.Set(ctx, key, value, l2TTL).Err()
}
 
// On invalidation, delete from both (L2 delete will propagate; L1 expires on TTL)
func (tc *TieredCache) Delete(ctx context.Context, key string) error {
	tc.l1.Remove(key)
	return tc.l2.Del(ctx, key).Err()
}

Critical: L1 TTL must be much shorter than L2 TTL (e.g., 2 seconds vs 5 minutes). Short L1 bounds staleness: when L2 is invalidated, L1 expires within seconds, visible to readers on the next cache miss.

When NOT to Cache

^{[PostgreSQL Docs]}

Caching the wrong data is worse than no cache. Avoid caching in these scenarios:

Write-heavy, read-rare data. If a key is written 100 times/sec and read once/sec, you pay invalidation overhead 100x for every 1x benefit. Examples: last-activity timestamps (write on every request), inventory counts (write on every sale, read occasionally), auction bids. The cost of cache coherence exceeds the read speedup. ^{[Dean & Barroso, 2013]}

Strong-consistency data. If showing a stale payment method or expired permission causes a security or financial failure, don't cache it. Money, authorization tokens, and critical state must be fresh. If you must cache, use write-through with zero TTL (synchronous cache + DB), not eventual consistency.

High-cardinality requests. Queries with unique parameters (generated request IDs, coordinates, full-text search queries) generate infinite distinct keys. Hit rate approaches zero; overhead of cache management (memory, eviction, network) exceeds any latency gain.

Cheap-to-compute data. A 2ms indexed query + 1ms Redis network overhead + cache invalidation complexity = slower than no cache. Only cache queries slower than 5-10ms where the hit rate is above 70%. ^{[Dean & Barroso, 2013]}

Unencrypted sensitive data. Credit card numbers, SSNs, health records in Redis require app-level encryption. Redis has no field-level encryption; a breach exposes everything at once. Encrypt before writing, decrypt after reading, or skip caching entirely for high-sensitivity data.

Production Checklist

The four production patterns every cache layer needs

Singleflight in Go — collapse N concurrent misses on the same key into a single backend fetch, the canonical defence against cache stampedes that doesn't require probabilistic refresh:

import "golang.org/x/sync/singleflight"
 
var sfg singleflight.Group
 
func (c *Cache) Get(ctx context.Context, key string) ([]byte, error) {
    if v, ok := c.local.Get(key); ok { return v, nil }
 
    // singleflight: any concurrent Get(key) waits on the in-flight fetch.
    val, err, _ := sfg.Do(key, func() (any, error) {
        return c.fetchFromOrigin(ctx, key)
    })
    if err != nil { return nil, err }
    c.local.Set(key, val.([]byte))
    return val.([]byte), nil
}

XFetch (probabilistic early expiration) — preempts the deterministic miss-storm at TTL boundary by giving each concurrent reader a small probability of refreshing slightly early:

import "math"
import "math/rand/v2"
 
// xFetchShouldRefresh returns true if the caller should pre-emptively refresh.
// delta is the typical time the origin fetch takes; beta typically 1.0.
func xFetchShouldRefresh(ttlRemaining, delta time.Duration, beta float64) bool {
    rand := rand.Float64()
    threshold := delta.Seconds() * beta * (-math.Log(rand))
    return ttlRemaining.Seconds() < threshold
}

Cache-key namespace convention — version-prefixed so a deploy that changes the cached struct doesn't poison the cache, with explicit collision detection in tests:

{service}:{entity}:v{schema}:{id}      # canonical
billing:invoice:v3:42                  # invoice schema v3, id=42
search:results:v1:hash:{queryHash}     # search-results table, content-hashed key

// Pin the cache-key version to the struct's declared schema version so a deploy
// that reshapes Invoice without bumping the version can be caught in a test.
const InvoiceSchemaVersion = 3
 
// Invoice carries its own schema version; bump it in lockstep with the struct.
type Invoice struct {
    SchemaVersion int    `json:"_schema"` // always == InvoiceSchemaVersion
    ID            string `json:"id"`
    // ... fields that, if changed, require bumping InvoiceSchemaVersion ...
}
 
// InvoiceCacheKey builds "billing:invoice:v{N}:{id}" from the single source of truth.
func InvoiceCacheKey(id string) string {
    return fmt.Sprintf("billing:invoice:v%d:%s", InvoiceSchemaVersion, id)
}

// Test guard: a deploy that changes the struct without bumping the version
// fails here instead of silently poisoning the cache with mismatched payloads.
func TestInvoiceCacheVersionPinned(t *testing.T) {
    inv := Invoice{SchemaVersion: InvoiceSchemaVersion, ID: "42"}
    if inv.SchemaVersion != InvoiceSchemaVersion {
        t.Fatalf("Invoice.SchemaVersion %d != InvoiceSchemaVersion %d — bump both together",
            inv.SchemaVersion, InvoiceSchemaVersion)
    }
    if got, want := InvoiceCacheKey("42"), "billing:invoice:v3:42"; got != want {
        t.Fatalf("cache key %q != %q — schema version drifted from the key prefix", got, want)
    }
}

Circuit breaker around the cache itself — when Redis is unreachable, serve stale from local LRU instead of forwarding the miss storm to your origin database. The single most-skipped piece of cache resilience:

// CacheOrStale wraps Get with a circuit breaker on Redis errors.
// On open circuit, returns the stale local copy (if any) instead of
// hitting the origin — the origin is what the cache exists to protect.
func (c *Cache) GetOrStale(ctx context.Context, key string) ([]byte, error) {
    val, err := c.cb.Execute(func() (any, error) {
        return c.redis.Get(ctx, key).Bytes()
    })
    if err == nil {
        return val.([]byte), nil
    }
    if stale, ok := c.staleLocal.Get(key); ok {
        c.metrics.staleServed.Inc()
        return stale, nil
    }
    return nil, err
}

The pattern: only fall back to origin when stale-local is also empty. Otherwise a Redis blip becomes a thundering herd against the database — the exact failure mode the cache exists to prevent.

Cache Warming: Surviving Cold Starts

A freshly deployed pod or a flushed Redis cluster has a hit rate of zero. The first wave of traffic punches through to the database, and on a busy service that wave is the same shape as a stampede. Three warming strategies cover the real cases.

Backfill from a known hot-key list. Maintain a rolling list of the top N keys by hit count (most teams keep this in Redis itself, refreshed by a background sampler). On boot, the new pod fetches the list and populates its L1 LRU before serving traffic. If you keep the list at ~5,000 keys, warming finishes in 2-4 seconds and absorbs ~80% of post-deploy traffic on most catalog services. ^{[Dean & Barroso, 2013]}

Lazy-load-on-deploy with traffic shifting. Pair backfill with a load balancer that gives the new pod 10% of traffic for the first 30 seconds, then 50%, then 100%. The pod warms organically without any single replica taking the full miss-storm. This is the cheapest option when you already run a service mesh or weighted routing. ^{[Dean & Barroso, 2013]}

Replay-on-promote for write-heavy caches. When a Redis replica is promoted to primary (failover, maintenance), it has the keys but its L1 mirror across pods is stale. Subscribe to the keyspace notification stream and replay the last N seconds of writes against the local cache before accepting traffic. Critical for write-through tiers where staleness causes correctness bugs, not just slow pages.

// WarmFromHotKeys fetches the top-N hot keys from Redis and populates L1.
// Called from the readiness probe — pod doesn't accept traffic until warm.
func (tc *TieredCache) WarmFromHotKeys(ctx context.Context, n int) error {
    keys, err := tc.l2.ZRevRange(ctx, "hot:keys", 0, int64(n-1)).Result()
    if err != nil {
        return fmt.Errorf("fetch hot key list: %w", err)
    }
    pipe := tc.l2.Pipeline()
    cmds := make(map[string]*redis.StringCmd, len(keys))
    for _, k := range keys {
        cmds[k] = pipe.Get(ctx, k)
    }
    if _, err := pipe.Exec(ctx); err != nil && err != redis.Nil {
        return fmt.Errorf("pipeline get: %w", err)
    }
    for k, cmd := range cmds {
        if val, err := cmd.Bytes(); err == nil {
            tc.l1.Add(k, val)
        }
    }
    tc.warmed.Store(true)
    return nil
}

A Real Stampede Incident: Singleflight in Anger

A travel-pricing service ran a 30-second TTL on its quote:{route}:{date} keys to keep prices fresh. Under normal load (~6,000 req/s) the database absorbed the predictable miss wave at TTL boundary. During a viral promo, traffic spiked to 38,000 req/s. At the next TTL flip, all 38,000 in-flight requests for the top 12 routes hit the cache cold simultaneously: ~456,000 quote-engine calls fanned into a service capped at 4,000 concurrent computations. Quote latency climbed from 40ms p99 to 14 seconds. The pricing pool exhausted in 11 seconds and customer-facing checkout returned 503s.

The fix shipped in one deploy: wrap the quote lookup in singleflight keyed by route:date, and adopt negative-cache entries for routes the engine returned no-availability for. The negative entry uses a short, jittered TTL — long enough to absorb retry storms from the same client, short enough that legitimate availability changes propagate quickly.

// QuoteOrAbsent returns a quote, the sentinel ErrNoAvailability, or a real error.
// Concurrent callers for the same key share one origin call via singleflight.
func (s *PricingCache) QuoteOrAbsent(ctx context.Context, route, date string) (*Quote, error) {
    key := fmt.Sprintf("quote:%s:%s", route, date)
 
    if val, err := s.cache.Get(ctx, key); err == nil {
        if bytes.Equal(val, sentinelAbsent) {
            return nil, ErrNoAvailability
        }
        var q Quote
        if err := json.Unmarshal(val, &q); err == nil {
            return &q, nil
        }
    }
 
    v, err, _ := s.sfg.Do(key, func() (any, error) {
        q, err := s.engine.Quote(ctx, route, date)
        if errors.Is(err, ErrNoAvailability) {
            // Negative cache: jitter prevents synchronized re-fetch.
            ttl := 20*time.Second + time.Duration(rand.Int63n(int64(10*time.Second)))
            s.cache.Set(ctx, key, sentinelAbsent, ttl)
            return nil, ErrNoAvailability
        }
        if err != nil {
            return nil, err
        }
        data, _ := json.Marshal(q)
        s.cache.Set(ctx, key, data, 30*time.Second)
        return q, nil
    })
 
    if err != nil {
        return nil, err
    }
    return v.(*Quote), nil
}

After deploy, the same 38,000 req/s spike produced 1,100 quote-engine calls (one per pod per hot key) instead of 456,000. P99 stayed under 90ms.

Negative Caching: TTL Math for Absent Data

Negative caching means storing the answer "this does not exist" with a deliberate TTL. It defends against repeated lookups for missing rows — bot traffic probing for /users/{random-uuid}, expired session IDs, deleted SKU pages — that would otherwise reach the database on every request.

The TTL math is sharper than for positive entries. A positive cache entry trades freshness for hit rate; a negative entry trades latency on first appearance for hit rate. If a real user creates the resource that the cache currently says is absent, they wait until the negative TTL expires before seeing their own data. Three rules: keep negative TTLs at least 5x shorter than positive TTLs (typically 10-60s), invalidate negative entries explicitly on the create path (cache.Delete runs in the same transaction as INSERT), and always jitter to avoid synchronized expiry across clients. Without jitter, negative caching turns a steady miss stream into a synchronized stampede every TTL window — exactly the failure mode it was meant to prevent. ^{[Cache stampede paper]}

Frequently Asked Questions

What is a cache stampede?

When cache keys expire simultaneously at peak traffic, all requests miss the cache at once and hammer the database. A 40,000-concurrent-user load at TTL expiry can send 40,000 queries per second to a database designed for 800 QPS. Prevent it with singleflight (deduplicate concurrent cache misses) or probabilistic early expiration.

What's the difference between cache-aside and write-through?

Cache-aside: app checks cache, handles misses by fetching from DB, populates cache itself. Write-through: app writes to cache first, cache synchronously writes to DB. Cache-aside is read-heavy and simple. Write-through ensures cache is always warm but adds write latency.

When should you avoid caching?

Don't cache write-heavy, read-rare data (inventory updates 100x/sec, read 1x/sec). Skip strong-consistency data (payments, permissions). Avoid high-cardinality requests (unique IDs = infinite key space). Don't cache cheap queries (indexed lookup faster than cache + fallback). ^{[Dean & Barroso, 2013]}

How do you invalidate caches without TTL?

Event-driven: publish invalidation events on data changes to a message queue; subscriber deletes cache keys. CDC (Change Data Capture): tools like Debezium watch the database WAL for all changes and trigger cache invalidation. Tag-based: store keys under tags; invalidating a tag evicts all tagged keys at once.

Keep Reading

Scaling Redis for High-Throughput Systems — Cluster topology, hot key mitigation, Lua scripts for atomicity, 500K+ ops/sec
Rate Limiter Algorithms: Token Bucket vs Sliding Window — Rate limiters interact with cache misses; separate buckets prevent cascading failures
Database Indexing Strategies — Proper indexing eliminates cache need for fast queries; cache only expensive ones

Was this article helpful?

Your feedback directly shapes our editorial depth and technical accuracy.