Before we shard, we need to understand why a single instance fails. It's usually not memory—it's CPU bound by the single-threaded nature of Redis's event loop or network bandwidth saturation.
# redis.conf - Basic Redis Cluster Configuration
port 6379
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
# Performance Tuning
maxmemory 4gb
maxmemory-policy allkeys-lruWe use a 3-primary, 3-replica setup to ensure high availability. Hash slots (16,384 total) are distributed evenly across the primary nodes.
graph TD
Client[Application Layer]
subgraph RC [Redis Cluster]
P1["Primary 1<br/>Slots 0-5460"]
P2["Primary 2<br/>Slots 5461-10922"]
P3["Primary 3<br/>Slots 10923-16383"]
R1[Replica 1] -.-> P1
R2[Replica 2] -.-> P2
R3[Replica 3] -.-> P3
end
Client -->|"CRC16(key) % 16384"| Hash{Slot Lookup}
Hash --> P1
Hash --> P2
Hash --> P3
style P1 fill:#ffcdd2,stroke:#b71c1c,color:#000
style P2 fill:#ffcdd2,stroke:#b71c1c,color:#000
style P3 fill:#ffcdd2,stroke:#b71c1c,color:#000
Each primary has a dedicated replica for automatic failover.
The hardest problem in computer science. We opted for a hybrid approach:
When a popular cache key expires, thousands of concurrent requests might simultaneously miss the cache and hit the database. This is known as the "Thundering Herd" problem.
To mitigate this, always add jitter to your TTLs:
import random
def set_with_jitter(client, key, value, ttl_seconds):
"""
Sets a key with a random jitter to prevent thundering herds.
"""
# Add ±15% jitter to the TTL
jitter = ttl_seconds * 0.15
actual_ttl = ttl_seconds + random.uniform(-jitter, jitter)
client.setex(key, int(actual_ttl), value)import redis
pool = redis.ConnectionPool(
host='redis-cluster',
port=6379,
max_connections=50,
socket_connect_timeout=2,
socket_timeout=2,
retry_on_timeout=True,
health_check_interval=30,
)
client = redis.Redis(connection_pool=pool)Before going to production, you must validate your cluster's performance. Use redis-benchmark to simulate load:
# Simulate 100k requests with 50 concurrent clients
redis-benchmark -h redis-cluster -p 6379 -t set,get -n 100000 -c 50 -q
# Test pipeline performance (16 commands per pipeline)
redis-benchmark -h redis-cluster -p 6379 -t set,get -n 100000 -P 16 -qKey targets:
Essential metrics to track:
Scaling isn't just about adding more nodes; it's about understanding how data flows through them. By properly sharding and managing connections, Redis can easily handle millions of ops/sec.
System Architecture Group
Experts in distributed systems, scalability, and high-performance computing.
Explore advanced caching patterns including write-through, write-behind, cache-aside, and distributed caching with Redis Cluster for high-throughput systems.
Stop blindly choosing a database. We benchmark performance, analyze consistency models, and compare operational complexity for high-scale workloads.
Learn how to design and implement fault-tolerant distributed systems using Go's concurrency primitives, circuit breakers, and graceful degradation patterns.