Distributed Cache

Overview

Designing a Distributed Cache (like Memcached or Redis Cluster) tests your understanding of caching strategies, consistency, and fault tolerance at scale. Caches sit in front of nearly every high-traffic system, and the question reveals how you think about eviction policies (LRU, LFU, TTL), partitioning (consistent hashing), and handling cache misses and stampedes. This design matters in interviews because it's a building block for many other designs—URL shortener, news feed, rate limiter—and the trade-offs (write-through vs write-back, cache-aside vs read-through) apply broadly. Showing you understand cache invalidation (one of the hard problems) and how to avoid thundering herd demonstrates practical systems knowledge.

Requirements

Functional

Get(key) and Set(key, value, ttl?)
Delete(key)
Support TTL and eviction (LRU, LFU)
Atomic operations (e.g., INCR)
Optional: pub/sub for invalidation

Non-Functional

Sub-millisecond latency
High throughput — millions of ops/sec
Fault tolerance — survive node failures
Scalability — add nodes without full rehash

Capacity Estimation

Assume 1TB cache, 1M QPS. 1000 nodes, 1GB each. Consistent hashing for minimal reshuffling.

Architecture Diagram

Component Deep Dive

Cache Client

Library that hashes key to node, handles connection pooling, retries. Implements consistent hashing.

Cache Node

In-memory store. LRU/LFU eviction. Supports get, set, delete, TTL. Single-threaded or multi-threaded.

Cluster Manager

Membership, failure detection. Gossip protocol. Rebalance on node add/remove.

Replication (optional)

Master-replica for durability. Async replication. Read from replica for scale.

Persistence (optional)

RDB snapshots or AOF for durability. Trade-off: performance vs durability.

Database Design

Cache is in-memory; no traditional DB. Optional: persist to disk (Redis RDB/AOF). Metadata (cluster topology) in ZooKeeper/etcd.

API Design

Method	Path	Description
`GET`	/get?key=	Get value. Returns 200 + value or 404.
`PUT`	/set	Set key-value. Body: {key, value, ttl?}. Returns 200.
`DELETE`	/delete?key=	Delete key. Returns 200.

Scalability & Trade-offs

Write-through vs write-back: Write-through is simpler, always consistent; write-back is faster but risks data loss.
Cache-aside vs read-through: Cache-aside gives app control; read-through simplifies app logic.
Consistent hashing: Minimizes data movement on scale; virtual nodes reduce load imbalance.

Related System Designs

Storage