Consistent Hashing: The 'Locker Room' Mental Model
How does Cassandra know which server stores your data? A mastery guide to consistent hashing, virtual nodes, and why your cache doesn't invalidate when a server is added.
You have 3 cache servers. You use server = hash(key) % 3 to decide which server stores each key.
You add a 4th server. Now hash(key) % 4 gives different results. Every cached key maps to a new server. Your cache hit rate drops from 90% to near 0%. Your database collapses.
This is why simple modular hashing breaks when you scale. Consistent Hashing is the solution used by Redis Cluster, Cassandra, DynamoDB, and every CDN.
Part 1: Foundations (The Mental Model)
Simple Hashing = The School Locker Assignment
Imagine 1,000 students and 10 lockers. locker = student_id % 10.
Simple. Uniform. Works perfectly.
Problem: A new locker (server) is added. Now student_id % 11. Almost every student gets a different locker. Everyone is locked out of their old locker. Mass confusion.
Consistent Hashing = The Clock/Ring
Instead of % servers, imagine a giant clock ring from 0 to 360 degrees (or 0 to 2^32).
- Place servers on the ring: Hash each server’s name → position on the ring.
- Place keys on the ring: Hash each key → position on the ring.
- Find the server: Walk clockwise from the key. The first server you hit owns that key.
0° (Server A)
/
360°--Ring--90°
\ (Server B)
180° (Server C)
Key "user:123" hashes to 45° → Walk clockwise → First server = Server A (at 0°/360°, wraps)
When a server is added/removed, only the keys between it and its predecessor on the ring are affected. Everything else stays put.
Old way: Add server → ~100% of keys remapped. Consistent Hashing: Add server → ~1/N keys remapped (where N = number of servers).
Part 2: The Investigation (Virtual Nodes)
A naive ring has one problem: uneven distribution. By chance, one server might cover 1° and another might cover 180°.
Virtual Nodes solve this. Instead of each server having 1 position, it has 100+ positions on the ring (with different hash seeds).
Physical Servers: A, B, C
Virtual Nodes:
Ring: A₁ B₂ C₃ A₄ B₅ C₆ A₇ B₈ C₉ A₁₀ ...
(100 virtual nodes each, evenly distributed)
Each key still walks clockwise to find its server. But now each physical server handles a well-distributed portion of the ring — even if servers have different capacities (a bigger server gets more virtual nodes).
Part 3: The Diagnosis (Common Problems)
| Problem | Cause | Fix |
|---|---|---|
| Uneven load | Few servers, bad hash collisions | Add virtual nodes (100-150 per server) |
| Hot spots | One key gets 90% of traffic | Shard the key: user:123:shard_{0-9} |
| Server fails, keys lost | No replication | Replicate to N servers clockwise on the ring |
| Stale reads after topology change | Client still has old ring state | Use gossip protocol (Cassandra) or centralized config (Zookeeper) |
Part 4: The Resolution (Where Consistent Hashing Is Used)
1. Redis Cluster
Redis Cluster uses hash slots (0–16383) — a simplified consistent hashing variant.
redis-cli cluster info
redis-cli cluster nodes # See which server owns which hash slots
2. Cassandra
Uses a full consistent hash ring with virtual nodes (“vnodes”). Each node’s token range determines which rows it stores.
3. Your Own Load Balancer (Python)
import hashlib
from bisect import bisect_right
class ConsistentHashRing:
def __init__(self, servers: list, replicas: int = 150):
self.ring = {}
self.sorted_keys = []
for server in servers:
for i in range(replicas):
# Create 150 virtual nodes per server
virtual_key = f"{server}:{i}"
h = int(hashlib.md5(virtual_key.encode()).hexdigest(), 16)
self.ring[h] = server
self.sorted_keys.append(h)
self.sorted_keys.sort()
def get_server(self, key: str) -> str:
h = int(hashlib.md5(key.encode()).hexdigest(), 16)
idx = bisect_right(self.sorted_keys, h) % len(self.sorted_keys)
return self.ring[self.sorted_keys[idx]]
# Usage
ring = ConsistentHashRing(["server-a", "server-b", "server-c"])
print(ring.get_server("user:123")) # → "server-b"
print(ring.get_server("user:456")) # → "server-a"
# Add a new server
ring2 = ConsistentHashRing(["server-a", "server-b", "server-c", "server-d"])
# Only ~25% of keys change their server!
Final Mental Model
Simple Hashing (% N) -> School lockers. Add one locker: everyone loses their stuff.
Consistent Hashing -> Clock Ring. Add a server: only nearby keys are affected.
Virtual Nodes -> Each server on the ring 150 times. Evenly distributed load.
hash(key) % N -> 100% remapped when N changes.
Consistent Hashing -> ~1/N remapped when N changes. (Critical for large caches).
When you need it:
- Distributed caches (Redis Cluster, Memcached).
- Distributed databases (Cassandra, DynamoDB).
- Load balancing stateful connections (sticky sessions without a lookup table).
Related posts
-
Caching & Redis: The 'Sticky Note' Mental Model
Why does Redis make everything faster? A mastery guide to cache invalidation (the hardest problem in CS), eviction strategies, and Redis data types.
-
CDN: The 'Local Convenience Store' Mental Model
Why does your image load instantly for users in the US but crawls in Vietnam? A mastery guide to CDN Edge Nodes, Cache-Control headers, and cache busting.
-
Task Queues & Message Brokers: Celery, RabbitMQ, and Kafka Untangled
Why does sending an email block your API? A mastery guide to async task queues (Celery/Django-Q), message brokers (RabbitMQ), and event streaming (Kafka).
-
Rate Limiting & Circuit Breaker: The 'Traffic Light & Fuse Box' Mental Model
How do you stop one bad client from taking down your entire API? A mastery guide to rate limiting strategies, circuit breakers, and resilience patterns.