Key-Value Store

Overview

Designing a distributed Key-Value Store (like Redis or DynamoDB) is a staple system design question at companies building large-scale systems. Amazon, Google, and Meta all operate massive KV stores, and understanding partitioning, replication, consistency models, and failure handling is essential. This question tests whether you can reason about CAP theorem, eventual vs strong consistency, and how to scale storage horizontally. It matters in interviews because KV stores underpin nearly every high-performance system—caching, session storage, feature flags—and the trade-offs you discuss (consistency vs availability, read vs write scaling) apply broadly across distributed systems.

Requirements

Functional

Put(key, value) and Get(key) operations
Delete(key) support
Support for key expiration (TTL)
Range queries or prefix scans (optional)
Atomic compare-and-set or conditional writes

Non-Functional

Low latency — single-digit ms for get/put
High throughput — millions of ops per second
Durability — data survives node failures
Scalability — petabyte-scale storage

Capacity Estimation

Assume 1B keys, 1KB avg value = 1PB. 100K writes/s, 1M reads/s. Replication 3x → 3PB total storage.

Architecture Diagram

Component Deep Dive

API Layer

Handles client connections, routes requests to appropriate partition. Supports REST or custom binary protocol.

Partitioning Service

Consistent hashing or range-based partitioning. Maps key to partition/shard. Handles rebalancing on scale-out.

Storage Node

Stores key-value pairs. Uses LSM tree or B-tree. Handles compaction, garbage collection.

Replication Layer

Replicates data across nodes. Leader-follower or multi-leader. Handles failover.

Gossip / Membership

Cluster membership, failure detection. Nodes discover each other and detect failures.

Coordinator

Handles cross-partition transactions if needed. Coordinates quorum reads/writes.

Database Design

Each storage node uses an LSM-tree (LevelDB, RocksDB) or B-tree. Partition key determines shard. Replication via WAL + async/sync replica. Metadata (partition map) in ZooKeeper or etcd.

API Design

Method	Path	Description
`PUT`	/v1/keys/{key}	Store key-value. Body: {value, ttl?}. Returns 200.
`GET`	/v1/keys/{key}	Retrieve value. Returns 200 + value or 404.
`DELETE`	/v1/keys/{key}	Delete key. Returns 200.
`GET`	/v1/keys?prefix={p}	List keys with prefix (if supported).

Scalability & Trade-offs

Consistency vs availability: Strong consistency (linearizable) requires quorum and adds latency; eventual consistency allows higher availability.
LSM vs B-tree: LSM has better write throughput; B-tree has better read latency and no compaction stalls.
Partitioning: Consistent hashing minimizes rebalancing; range partitioning allows efficient range queries.

Related System Designs

Storage