Consensus algorithms are the foundation of every replicated database, distributed lock service, and configuration store. Raft was designed to be understandable -- and that makes it an excellent starting point for learning about distributed consensus.
In a distributed system, you want multiple servers to agree on a sequence of values (the replicated log). Even if some servers crash or network partitions occur, the remaining servers should continue to operate correctly and consistently.
Raft achieves this by electing a leader that coordinates all changes. As long as a majority of servers are available, the system continues to make progress.
graph TD
subgraph "Raft State Machine"
F[Follower]
C[Candidate]
L[Leader]
F -->|Timeout| C
C -->|Majority Votes| L
L -->|Discover Higher Term| F
C -->|Discover Higher Term| F
C -->|Timeout| C
end
style L fill:#fff9c4,stroke:#fbc02d
style C fill:#ffe0b2,stroke:#f57c00
style F fill:#e1f5fe,stroke:#0288d1
Raft divides time into terms. Each term begins with an election. A node starts an election by incrementing its term, voting for itself, and requesting votes from peers.
type RaftNode struct {
mu sync.Mutex
id int
currentTerm int
votedFor int
state NodeState // Follower, Candidate, Leader
log []LogEntry
commitIndex int
peers []int
}
func (n *RaftNode) startElection() {
n.mu.Lock()
n.currentTerm++
n.votedFor = n.id
n.state = Candidate
term := n.currentTerm
lastLogIndex := len(n.log) - 1
lastLogTerm := 0
if lastLogIndex >= 0 {
lastLogTerm = n.log[lastLogIndex].Term
}
n.mu.Unlock()
votes := 1 // Self-vote
var voteMu sync.Mutex
for _, peer := range n.peers {
go func(p int) {
reply := n.sendRequestVote(p, &RequestVoteArgs{
Term: term,
CandidateId: n.id,
LastLogIndex: lastLogIndex,
LastLogTerm: lastLogTerm,
})
if reply.VoteGranted {
voteMu.Lock()
votes++
if votes > (len(n.peers)+1)/2 {
n.becomeLeader()
}
voteMu.Unlock()
}
}(peer)
}
}A candidate wins the election if it receives votes from a majority of servers. If no candidate wins (split vote), a new election begins after a randomized timeout.
sequenceDiagram
participant C as Candidate
participant F as Follower
Note over C: Term++ (Vote Self)
C->>F: RequestVote(Term, lastLogIndex)
alt Term < currentTerm
F-->>C: VoteDenied (Term outdated)
else Log ok & Not Voted
Note over F: VotedFor = C
F-->>C: VoteGranted
end
Once a leader is elected, it accepts client requests, appends entries to its log, and replicates them to followers using AppendEntries RPCs.
func (n *RaftNode) appendEntries(peer int, args *AppendEntriesArgs) {
reply := n.sendAppendEntries(peer, args)
if reply.Success {
n.nextIndex[peer] = args.PrevLogIndex + len(args.Entries) + 1
n.matchIndex[peer] = n.nextIndex[peer] - 1
n.advanceCommitIndex()
} else {
// Decrement nextIndex and retry
n.nextIndex[peer]--
}
}An entry is committed once a majority of servers have replicated it. Committed entries are safe -- they will never be lost even if servers crash.
sequenceDiagram
participant C as Client
participant L as Leader
participant F as Follower
C->>L: Command(X)
L->>L: Log Append(X)
par Replicate
L->>F: AppendEntries(Entries=[X])
F->>F: Log Append(X)
F-->>L: Success
end
L->>L: Commit(X)
L-->>C: Response
Raft provides several important safety guarantees: Election Safety ensures at most one leader per term. Leader Append-Only means a leader never overwrites or deletes entries in its log. Log Matching guarantees that if two logs contain an entry with the same index and term, all preceding entries are identical.
Raft powers many production systems. etcd uses Raft for Kubernetes' control plane. CockroachDB uses multi-Raft, running thousands of Raft groups for fine-grained replication. TiKV uses Raft for its distributed key-value storage layer.
The most common mistakes when implementing Raft are: not persisting state before responding to RPCs, not handling split votes correctly, and not implementing log compaction (snapshotting). Without log compaction, the log grows unboundedly and new servers take forever to catch up.
Raft makes distributed consensus approachable. Understanding its mechanics helps you reason about the behaviour of systems like etcd, CockroachDB, and TiKV. While you may never need to implement Raft yourself, knowing how it works makes you a better distributed systems engineer.
System Architecture Group
Experts in distributed systems, scalability, and high-performance computing.
Learn how to design and implement fault-tolerant distributed systems using Go's concurrency primitives, circuit breakers, and graceful degradation patterns.
Explore advanced caching patterns including write-through, write-behind, cache-aside, and distributed caching with Redis Cluster for high-throughput systems.
Stop blindly choosing a database. We benchmark performance, analyze consistency models, and compare operational complexity for high-scale workloads.