#database-internals #redis #system-design #caching #performance #scaling

Scaling Redis for High-Throughput Systems

BackendBytes Engineering Team

Oct 24, 2024

16 min read

Scaling Redis for High-Throughput Systems

Part of Series: Distributed Systems Mastery

Lesson 6 of 6

→Redis is single-threaded: one CPU core, roughly 100-200K ops/sec max; 600K flash-sale users all hashed to one shard, one core hit 100% CPU while five shards sat idle
→Hot keys bottleneck on a single shard when access patterns concentrate — mitigate with 1-second in-process LRU (99x reduction) for reads or replicated key reads for high-throughput keys
→Pipelined batches (100 commands, 1 round trip) are 10-50x faster than sequential commands; network overhead dominates for small operations
→Redis Cluster uses 16,384 fixed hash slots; multi-key operations need hash tags ({user:123}:*) to land on same slot — plan key structure upfront or migrations are painful

One trending key pins a single CPU core while five sibling shards sit green. A flash-sale or trending-content workload pushes every concurrent user at the same key — one shard, one bottleneck. We debugged this exact failure mode on multiple production teams, and the trace looks the same every time.

The Flash Sale That Bottlenecked on a Single Key

The hot-key failure mode — a pattern we see repeatedly in flash-sale and trending-content workloads:

A 6-shard Redis Cluster, each shard rated for ~100k ops/sec on a single core (Redis is single-threaded for command processing^{[Redis Docs]}), runs a flash sale. The tests pass. But all the concurrent users hash to the same key — product:flash-sale:current — and therefore the same hash slot, the same primary shard, the same CPU core. The other five shards sit idle.

Within 30 seconds the hot core saturates, cache latency spikes from sub-millisecond to tens of milliseconds, the database absorbs the miss storm, and checkout fails for a sizeable fraction of users while the rest of the cluster shows green dashboards.

The diagnostic pattern is the same every time: cluster-level metrics look fine, single-shard CPU is pinned at 100%.

The Short Version

Redis scales horizontally via Cluster sharding (16,384 hash slots across nodes), but bottlenecks when access patterns concentrate on one key or node. Prevent hot keys with in-process LRU caches or key replication. Use pipelined batches instead of sequential commands. Scale connection pools carefully — concurrent Pod deploys can saturate Redis's max connections. Tune eviction and monitor CPU, not just ops/sec.

Scale via Redis Cluster (multi-key ops need hash tags: {user:123}:profile)
Mitigate hot keys with 1-second local LRU caches (99x reduction) or replicated reads
Batch commands with pipelining (10x faster than sequential ops)
Pool connections correctly — 50 conns per Pod × 20 Pods = 1,000 Redis connections
Eviction policies: set maxmemory-policy allkeys-lru for cache workloads, tune maxmemory ^{[Redis pipelining]}

The Quick Start: Sentinel vs Cluster

Redis is single-threaded for command processing — one CPU core, roughly 100–200k ops/sec max per instance per Redis's own benchmarks^{[Redis Docs]}. Choose the right topology before scaling:

	Redis Sentinel	Redis Cluster
Purpose	High availability (single dataset)	Horizontal scaling across nodes
Sharding	None — all data on one primary	16,384 hash slots distributed
Max throughput	100–200k ops/sec	100k × N nodes (~1M for 10 nodes)
Multi-key ops	MGET/MSET always work	Require hash tags: `{user:123}:*`
When to use	< 25GB data, < 100k ops/sec	Larger datasets or higher throughput

Start with Sentinel. Add Cluster when you hit a throughput or RAM limit on a single primary.

Setting Up Redis Cluster

Redis Cluster^{[Redis Cluster spec]} uses 16,384 fixed hash slots. Every key maps to one slot: slot = CRC16(key) % 16384. This distributes load across nodes — unless access patterns concentrate on a few keys.

The cluster topology in one picture — gossip-based discovery + slot ownership + client-side routing:

graph TB
    Client[Application client<br/>jedis / lettuce / go-redis] -->|MOVED redirect<br/>updates slot map cache| Slots[Slot map<br/>0-5460 P1<br/>5461-10922 P2<br/>10923-16383 P3]
    Slots --> P1[Primary 1<br/>slots 0-5460]
    Slots --> P2[Primary 2<br/>slots 5461-10922]
    Slots --> P3[Primary 3<br/>slots 10923-16383]
    P1 -.->|async replication| R1[Replica 1]
    P2 -.->|async replication| R2[Replica 2]
    P3 -.->|async replication| R3[Replica 3]
    P1 <-->|gossip protocol<br/>cluster bus on<br/>port + 10000| P2
    P2 <-->|gossip| P3
    P3 <-->|gossip| P1
    P1 -.->|failure detected<br/>by majority| Failover[Sentinel-style<br/>auto-failover<br/>R1 promotes to primary]
    style P1 fill:#dfd
    style P2 fill:#dfd
    style P3 fill:#dfd
    style R1 fill:#ffd
    style R2 fill:#ffd
    style R3 fill:#ffd
    style Failover fill:#fdd

Three production rules visible: (1) the client caches the slot map and only refreshes on MOVED redirect; (2) gossip happens on port + 10000 (e.g. 16379 for default Redis 6379); (3) failover requires majority quorum — a 3-primary cluster tolerates 1 primary loss.

Cluster setup (3 primaries, 3 replicas):

redis-cli --cluster create \
  redis-1:6379 redis-2:6379 redis-3:6379 \
  redis-4:6379 redis-5:6379 redis-6:6379 \
  --cluster-replicas 1

Multi-key operations (MGET, MSET, transactions) only work if all keys share a hash tag. The hash tag {tag} determines the slot — everything inside {} is hashed:

// All keys with {user:123} land on the same slot
keys := []string{
	fmt.Sprintf("{user:123}:profile"),
	fmt.Sprintf("{user:123}:orders"),
	fmt.Sprintf("{user:123}:prefs"),
}
results, err := client.MGet(ctx, keys...).Result() // Works: all on same slot

Without hash tags, MGET across different slots returns a CROSSSLOT error. Plan your key structure upfront — once you add hash tags, changing them requires a migration.

Cluster client setup in Go (go-redis v9):

import "github.com/redis/go-redis/v9"
 
client := redis.NewClusterClient(&redis.ClusterOptions{
	Addrs: []string{"redis-1:6379", "redis-2:6379", "redis-3:6379"},
	RouteByLatency: true,        // Read from replicas, write to primary
	PoolSize: 50,                // Connections per node
	MaxRedirects: 8,             // Retry on resharding
	ReadTimeout: 500 * time.Millisecond,
	WriteTimeout: 500 * time.Millisecond,
})

Hot Key Mitigation: In-Process Cache + Replication

^{[Cache stampede paper]}

A hot key is a single key receiving disproportionate requests, causing one shard to bottleneck. Symptoms: one node at 100% CPU while others idle, latency spikes for that key.

The mitigation stack is layered — each tier catches the fraction of traffic that still makes it through the layer above:

graph LR
    Req["N requests"] --> L1{"In-process<br/>LRU hit?"}
    L1 -->|hit| Serve["serve locally<br/>~0 network"]
    L1 -->|miss| L2{"Hot key<br/>replicated?"}
    L2 -->|yes| Replica["any of R replicas<br/>N/R per shard"]
    L2 -->|no| Shard["single shard<br/>full N"]
    Replica --> Redis[("Redis")]
    Shard --> Redis

A 1-second local LRU TTL + 10-way replica fan-out collapses 100K req/s on a hot key to ~1K req/s per shard — enough headroom that a single shard never bottlenecks.

Solution 1: Local in-process LRU cache (read-heavy keys)

For a product receiving 100k reads/sec, a 1-second local LRU TTL reduces Redis load by 99x: ^{[Beyer et al., 2016]}

import (
	lru "github.com/hashicorp/golang-lru/v2"
	"github.com/redis/go-redis/v9"
)
 
func GetProductCached(ctx context.Context, client *redis.ClusterClient, cache *lru.Cache[string, CacheEntry], productID string) ([]byte, error) {
	// Check local cache first (microseconds)
	if entry, ok := cache.Get(productID); ok && time.Now().Before(entry.Expires) {
		return entry.Data, nil
	}
 
	// Cache miss — fetch from Redis
	data, err := client.Get(ctx, fmt.Sprintf("product:%s", productID)).Bytes()
	if err != nil {
		return nil, err
	}
 
	// Store locally for 1 second
	cache.Add(productID, CacheEntry{
		Data:    data,
		Expires: time.Now().Add(1 * time.Second),
	})
	return data, nil
}

For product catalogs or config, a 1-second local TTL is acceptable. For inventory counts, use replicas instead.

Solution 2: Key replication (writable keys)

Store N copies on different slots, read from a random replica:

func SetProductReplicated(ctx context.Context, client *redis.ClusterClient, baseKey string, data []byte, replicas int) error {
	for i := 0; i < replicas; i++ {
		// Each copy lands on a different slot
		key := fmt.Sprintf("product:%d:%s", i, baseKey)
		if err := client.Set(ctx, key, data, 24*time.Hour).Err(); err != nil {
			return err
		}
	}
	return nil
}
 
func GetProductReplicated(ctx context.Context, client *redis.ClusterClient, baseKey string, replicas int) ([]byte, error) {
	// Random replica spread
	idx := rand.IntN(replicas)
	key := fmt.Sprintf("product:%d:%s", idx, baseKey)
	return client.Get(ctx, key).Bytes()
}

With 10 replicas, each shard handles ~10k RPS instead of 100k on the primary.

Pipelining: High-Throughput Batch Reads

At 1ms RTT, 1,000 sequential GET commands take 1 second of pure network overhead. Pipelining^{[Redis pipelining]} batches all commands into one round trip:

// Without pipelining: 1000 RTTs (~1 second)
for _, id := range ids {
	data, _ := client.Get(ctx, fmt.Sprintf("product:%s", id)).Bytes()
	results = append(results, data)
}
 
// With pipelining: ~1 RTT per cluster shard
cmds, _ := client.Pipelined(ctx, func(pipe redis.Pipeliner) error {
	for _, id := range ids {
		pipe.Get(ctx, fmt.Sprintf("product:%s", id))
	}
	return nil
})
 
// Extract results
for _, cmd := range cmds {
	results = append(results, []byte(cmd.(*redis.StringCmd).Val()))
}

go-redis clusters automatically group commands by slot and send to the right node. For 100 IDs, pipelined reads are 10-50x faster than sequential. ^{[Redis pipelining]}

Production Checklist

^{[Redis Cluster spec]}

Set maxmemory-policy allkeys-lru in redis.conf for cache workloads; use volatile-lru if mixing cached and persistent data
Set maxmemory to 50% of physical RAM (leave headroom for fork overhead during RDB/AOF rewrites)
Monitor cache hit rate via redis-cli INFO stats | grep keyspace_hits. Target > 95%
Pool size: calculate as connections_per_pod × num_pods. Start with 50 per pod; adjust if you see connection errors
Stagger pod restarts with readiness probes that warm the connection pool — prevents connection storms
Replica read routing: use RouteByLatency: true in go-redis to offload reads to replicas
Eviction rate: monitor evicted_keys — if rising, your hot set is larger than maxmemory or your TTLs are too aggressive
Replication lag: replicas should be within 10ms of primary. Monitor slave_repl_offset vs master_repl_offset

Conclusion

The flash sale engineering team's second run, 48 hours after the incident: 800k concurrent users, peak throughput of 1.2M ops/sec, no shard above 40% CPU. Checkout success rate: 99.7%. ^{[Beyer et al., 2016]}

Scaling Redis isn't about raw ops/sec on a spec sheet. It's about understanding that a single instance has a hard ceiling (~100-200k ops/sec), and that uneven key distribution can make a 6-shard cluster behave like a single-threaded machine. The patterns that prevent this — key replication, local caching, pipelined batches, connection pooling discipline — are boring infrastructure decisions, not technical breakthroughs. But they're what separate a flash sale that works from one that fails.

Start with Sentinel and a single instance. Move to Cluster only when you hit throughput limits. Prevent hot keys with in-process LRU caches (99x load reduction). Batch reads with pipelining. Pool connections with discipline. Monitor hit rates, not just ops/sec. ^{[Redis pipelining]}

go-redis pool sizing that survives a slow Redis

Default redis.NewClient ships with PoolSize = 10 * runtime.GOMAXPROCS(0) and no read timeout — fine until Redis blocks on a single slow command and every goroutine in your service queues behind it. The configuration below is what we run on services pushing 200k+ ops/sec:

import (
	"context"
	"time"
 
	"github.com/redis/go-redis/v9"
)
 
func NewRedisClient(addr string) *redis.Client {
	return redis.NewClient(&redis.Options{
		Addr: addr,
 
		// Pool sized for concurrency, not for parallelism.
		// PoolSize > NumCPU is fine — Redis is single-threaded but our
		// goroutines block on the network round-trip, not CPU.
		PoolSize:     200,
		MinIdleConns: 20,
 
		// The four timeouts that fail-fast instead of cascading:
		// DialTimeout — TCP connect ceiling, before any command runs.
		// ReadTimeout — per-command read budget; tighter than HTTP request budget.
		// WriteTimeout — protects against TCP backpressure from saturated link.
		// PoolTimeout — how long a goroutine waits for a free connection;
		//                exceeding this returns an error instead of hanging.
		DialTimeout:  500 * time.Millisecond,
		ReadTimeout:  200 * time.Millisecond,
		WriteTimeout: 200 * time.Millisecond,
		PoolTimeout:  100 * time.Millisecond,
 
		// Idle connection hygiene — beats most NAT/firewall idle drops.
		ConnMaxIdleTime: 5 * time.Minute,
		ConnMaxLifetime: 30 * time.Minute,
	})
}

The matching pipeline pattern that turns 1000 sequential GETs (1s of network RTT) into one batched round trip (~5ms). The trick most teams miss: pipelines do not preserve ordering across nodes in Redis Cluster — group commands by hash slot before issuing the batch, otherwise you get partial responses and silent retries: ^{[Redis pipelining]}

type batchedReader struct {
	rdb redis.UniversalClient
}
 
// MGetSlotAware batches reads, grouping keys by their hash slot so each
// pipeline round trip lands on a single primary. For Cluster topologies
// this is the difference between 1 RTT and N RTTs (one per shard).
func (b *batchedReader) MGetSlotAware(ctx context.Context, keys []string) (map[string]string, error) {
	groups := groupKeysBySlot(keys)         // {slot: []key}
	out := make(map[string]string, len(keys))
 
	pipes := make([]redis.Pipeliner, 0, len(groups))
	cmds  := make([][]*redis.StringCmd, 0, len(groups))
 
	for _, group := range groups {
		pipe := b.rdb.Pipeline()
		groupCmds := make([]*redis.StringCmd, len(group))
		for i, k := range group {
			groupCmds[i] = pipe.Get(ctx, k)
		}
		pipes = append(pipes, pipe)
		cmds = append(cmds, groupCmds)
	}
 
	// Issue every pipeline concurrently; gather results.
	for i, pipe := range pipes {
		if _, err := pipe.Exec(ctx); err != nil && err != redis.Nil {
			return nil, err
		}
		for _, c := range cmds[i] {
			if v, err := c.Result(); err == nil {
				out[c.Args()[1].(string)] = v
			}
		}
	}
	return out, nil
}

The bug this fixes: hand-rolled MGET against Cluster fails with CROSSSLOT Keys in request don't hash to the same slot for any batch spanning multiple shards. Slot-aware pipelining is the canonical workaround.

The slot computation itself — pure CRC16 against the key (or the contents of the first {...} segment if present, the hash-tag escape hatch for forcing two keys to the same slot):

import "strings"
 
// slot returns the Redis Cluster hash slot for a key: CRC16(key) mod 16384,
// using the same CRC16/XMODEM (CCITT) variant Redis uses — polynomial 0x1021,
// initial value 0x0000. If the key contains a {tag} substring, only the tag is
// hashed, which forces co-location for keys you must batch atomically.
func slot(key string) uint16 {
    if start := strings.IndexByte(key, '{'); start >= 0 {
        if end := strings.IndexByte(key[start+1:], '}'); end > 0 {
            key = key[start+1 : start+1+end] // hash only the {...} contents
        }
    }
    var crc uint16
    for i := 0; i < len(key); i++ {
        crc ^= uint16(key[i]) << 8
        for j := 0; j < 8; j++ {
            if crc&0x8000 != 0 {
                crc = (crc << 1) ^ 0x1021
            } else {
                crc <<= 1
            }
        }
    }
    return crc % 16384
}
 
// Group keys by Cluster slot. {tag} hash-tag forces co-location for keys
// you must batch atomically (the only path to safe MULTI/EXEC across keys
// in Cluster mode).
func groupKeysBySlot(keys []string) map[uint16][]string {
    out := make(map[uint16][]string)
    for _, k := range keys {
        out[slot(k)] = append(out[slot(k)], k)
    }
    return out
}

Streams vs Pub/Sub: pick durability deliberately

Classic Redis Pub/Sub is fire-and-forget. A subscriber that disconnects for two seconds loses every message published in that window, and there is no acknowledgement, no replay, no consumer group. That works for cache-invalidation fan-out where a missed message just means a slightly stale read on one node, but it falls apart the moment you reach for it as a work queue or event log. Redis Streams (XADD, XREADGROUP, XACK) close that gap: messages persist to RDB and AOF, consumer groups distribute work across workers with at-least-once delivery, and a pending entries list (PEL) tracks every unacked message so a crashed consumer's work can be claimed by another via XCLAIM or auto-claimed via XAUTOCLAIM after an idle threshold. The trade-off is memory pressure — a stream grows until you cap it with MAXLEN or MINID.

The decision rule we follow in production: use Pub/Sub only for ephemeral coordination signals where loss is acceptable (cache invalidation, presence notifications, leader-election heartbeats). Use Streams for any payload representing a durable event — payment intents, audit records, async job dispatch. The Go consumer below shows the canonical pattern with bounded retries and dead-letter routing for poison messages.

// Stream consumer with consumer-group semantics, bounded retries,
// and a dead-letter stream for poison messages. Run one goroutine per
// worker; the consumer name should be unique per process (hostname+pid).
func consume(ctx context.Context, rdb *redis.Client, group, consumer string) error {
    const maxDeliveries = 5
    for {
        res, err := rdb.XReadGroup(ctx, &redis.XReadGroupArgs{
            Group:    group,
            Consumer: consumer,
            Streams:  []string{"orders", ">"},
            Count:    16,
            Block:    5 * time.Second,
        }).Result()
        if err == redis.Nil {
            continue // idle, no new messages
        }
        if err != nil {
            return fmt.Errorf("xreadgroup: %w", err)
        }
        for _, msg := range res[0].Messages {
            if err := handle(ctx, msg.Values); err != nil {
                // Inspect delivery count from XPENDING; route poison messages
                // to a DLQ stream rather than blocking the group forever.
                pending, _ := rdb.XPendingExt(ctx, &redis.XPendingExtArgs{
                    Stream: "orders", Group: group, Start: msg.ID, End: msg.ID, Count: 1,
                }).Result()
                if len(pending) > 0 && pending[0].RetryCount >= maxDeliveries {
                    rdb.XAdd(ctx, &redis.XAddArgs{Stream: "orders.dlq", Values: msg.Values})
                    rdb.XAck(ctx, "orders", group, msg.ID)
                }
                continue
            }
            rdb.XAck(ctx, "orders", group, msg.ID)
        }
    }
}

Pair this with a janitor that runs XAUTOCLAIM every 30 seconds against messages idle for longer than your processing SLA — that way a Kubernetes pod eviction does not strand work in the PEL until manual intervention. Cap stream length with XADD orders MAXLEN ~ 1000000 * so producers approximately bound memory; the ~ lets Redis trim in chunks instead of one entry at a time, which keeps XADD latency stable under load.

Client-side caching with RESP3 broadcast invalidation

Redis 6 introduced server-assisted client-side caching over the RESP3 protocol. The pattern: clients cache reads locally, and Redis pushes invalidation messages whenever a tracked key changes. The savings are dramatic — for hot keys, you replace a network round-trip with an in-process map lookup, dropping p99 latency from 800µs to under 10µs and removing the read entirely from Redis CPU. Two tracking modes exist: default mode (server tracks every key each client reads, expensive for the server) and broadcast mode (clients subscribe to key-prefix patterns, server sends one invalidation per write fanned out to all matching subscribers). Broadcast mode is the only mode worth running at scale because server-side memory does not grow with the number of cached keys per client.

Three pitfalls bite teams adopting this. First, your local cache is eventually consistent — there is a window between a write landing on the primary and the invalidation reaching subscribers, so any read-your-writes guarantee must come from the application path that issued the write. Second, you must handle reconnects: if the tracking connection drops, you must invalidate the entire local cache before reconnecting, because invalidations during the disconnect were lost. Third, broadcast mode delivers invalidations even for keys you never read, so prefix selection matters for bandwidth.

// Client-side cache with RESP3 broadcast tracking. The tracking connection
// is separate from the command connection; invalidations arrive as RESP3
// push messages on the tracking conn and we apply them to the local map.
type CSCache struct {
    mu    sync.RWMutex
    local map[string][]byte
    rdb   *redis.Client
}
 
func (c *CSCache) Start(ctx context.Context, prefixes []string) error {
    // BCAST mode: server sends invalidations for any key matching a prefix,
    // regardless of whether this client ever read it. NOLOOP suppresses
    // self-invalidation when this client is also the writer.
    args := []any{"TRACKING", "ON", "BCAST", "NOLOOP"}
    for _, p := range prefixes {
        args = append(args, "PREFIX", p)
    }
    if err := c.rdb.Do(ctx, append([]any{"CLIENT"}, args...)...).Err(); err != nil {
        return fmt.Errorf("enable tracking: %w", err)
    }
    go c.listen(ctx) // consume invalidate push messages on the tracking conn
    return nil
}
 
func (c *CSCache) Get(ctx context.Context, key string) ([]byte, error) {
    c.mu.RLock()
    if v, ok := c.local[key]; ok {
        c.mu.RUnlock()
        return v, nil
    }
    c.mu.RUnlock()
    v, err := c.rdb.Get(ctx, key).Bytes()
    if err != nil {
        return nil, err
    }
    c.mu.Lock()
    c.local[key] = v
    c.mu.Unlock()
    return v, nil
}
 
func (c *CSCache) onInvalidate(keys []string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, k := range keys {
        delete(c.local, k)
    }
}

Bound the local cache size with an LRU — without one, a hostile or buggy access pattern will OOM the application before invalidations arrive. Track hit ratio per prefix as a Prometheus metric; a falling hit ratio is the early signal that your prefix is too coarse and you are paying invalidation bandwidth for keys nobody caches.

ACLs and TLS for multi-tenant production

Default Redis ships with a single password and no command-level authorization, which is fine for a development laptop and disastrous for a shared production cluster. Redis 6 added ACLs: per-user credentials with command, key-pattern, and channel-pattern restrictions. The right baseline is one user per service with the narrowest possible permission set, the legacy default user disabled, and TLS terminated at the Redis port (not at a sidecar) so that credentials never traverse the network in plaintext. Below is a production-ready users.acl showing the patterns we deploy.

# Disable the default account so no client can connect without explicit creds.
user default off
 
# Read-only analytics: GET/MGET/SCAN on the analytics: prefix only,
# no write commands, no admin commands, no pub/sub.
user analytics on >REDACTED-STRONG-SECRET ~analytics:* +@read +@connection -@dangerous
 
# Application service: full data-plane access on its own keyspace,
# Streams on its own queue, no FLUSHDB/FLUSHALL/CONFIG/DEBUG/SCRIPT LOAD.
user orders-svc on >REDACTED-STRONG-SECRET \
    ~orders:* ~orders.dlq \
    &orders.events \
    +@all -@dangerous -flushall -flushdb -config -debug -keys -script|load
 
# Operator account for runbooks; gated behind break-glass workflow.
# Allowed CONFIG GET but not SET, allowed CLUSTER inspection but not failover.
user oncall on >REDACTED-STRONG-SECRET ~* &* +@all -@dangerous +config|get +cluster|info -cluster|failover

Three operational rules make ACLs durable. First, version the users.acl file in a config repo and ship it via configuration management, not ACL SETUSER over the wire — drift between nodes is the failure mode that lets an attacker keep access after a credential rotation. Second, rotate secrets through a transition user (orders-svc-v2) before deleting the old one, so the cutover does not require a deploy of every consumer at the same instant. Third, log ACL LOG to your SIEM — every denied command is a misconfiguration or an attempted privilege escalation, and silence on that channel is what you want to hear. Combine ACLs with tls-port 6379, tls-auth-clients yes, and per-service client certificates so that even a leaked password on its own does not authenticate against the cluster.

Frequently Asked Questions

Why is Redis single-threaded and how does that affect scaling?

Redis processes commands on a single thread to avoid lock contention and synchronization overhead, giving predictable performance. This means a single instance peaks at roughly 100-200k ops/sec for simple commands, and you must use Redis Cluster for horizontal scaling across multiple CPU cores.

What is a Redis hot key and how do you fix it?

A hot key is a single key receiving disproportionate traffic, causing one shard to bottleneck while others sit idle. Fix it by replicating the key across multiple shards with random suffixes (e.g., product::), using read replicas, or restructuring data to distribute load.

When should you use Redis Sentinel vs Redis Cluster?

Use Redis Sentinel for high availability with datasets under 25GB and under 100k ops/sec — it provides automatic failover for a single primary. Use Redis Cluster when you need horizontal scaling beyond one machine's RAM or throughput, as it shards data across multiple primaries.

How do you use Redis pipelining to improve throughput?

Pipelining batches multiple Redis commands into a single network round trip instead of waiting for each response individually. This reduces network overhead dramatically — a batch of 100 pipelined commands can be 10-50x faster than 100 sequential commands. ^{[Redis pipelining]}

Keep Reading

Caching Strategies at Scale: The Complete Guide Beyond Key-Value Stores — Cache-aside, write-through, write-behind patterns; cache stampede prevention; event-driven invalidation
Database Indexing Strategies: B-Trees, GIN, GiST, and Production Tuning — Before reaching for Redis, index your queries properly — a 2ms indexed query doesn't need a cache layer
Understanding Raft Consensus — How Redis Cluster's gossip topology compares to Raft consensus for distributed system reliability
Rate Limiter Algorithms — Token bucket, sliding window log, and sliding window counter — every algorithm uses Redis for shared state across pods
Distributed Rate Limiting (Probabilistic Drop) — When per-request Redis-Lua adds too much latency, drop_ratio gives you global enforcement with local in-memory checks

Was this article helpful?

Your feedback directly shapes our editorial depth and technical accuracy.