Key-Value Store

Medium Storage

Overview

Designing a distributed Key-Value Store (like Redis or DynamoDB) is a staple system design question at companies building large-scale systems. Amazon, Google, and Meta all operate massive KV stores, and understanding partitioning, replication, consistency models, and failure handling is essential. This question tests whether you can reason about CAP theorem, eventual vs strong consistency, and how to scale storage horizontally. It matters in interviews because KV stores underpin nearly every high-performance system—caching, session storage, feature flags—and the trade-offs you discuss (consistency vs availability, read vs write scaling) apply broadly across distributed systems.

Requirements

Functional

  • Put(key, value) and Get(key) operations
  • Delete(key) support
  • Support for key expiration (TTL)
  • Range queries or prefix scans (optional)
  • Atomic compare-and-set or conditional writes

Non-Functional

  • Low latency — single-digit ms for get/put
  • High throughput — millions of ops per second
  • Durability — data survives node failures
  • Scalability — petabyte-scale storage

Capacity Estimation

Assume 1B keys, 1KB avg value = 1PB. 100K writes/s, 1M reads/s. Replication 3x → 3PB total storage.

Architecture Diagram

ClientsAPI LayerPartition ACoordinatorPartition BStorage AStorage BReplication Mgr

Component Deep Dive

API Layer

Handles client connections, routes requests to appropriate partition. Supports REST or custom binary protocol.

Partitioning Service

Consistent hashing or range-based partitioning. Maps key to partition/shard. Handles rebalancing on scale-out.

Storage Node

Stores key-value pairs. Uses LSM tree or B-tree. Handles compaction, garbage collection.

Replication Layer

Replicates data across nodes. Leader-follower or multi-leader. Handles failover.

Gossip / Membership

Cluster membership, failure detection. Nodes discover each other and detect failures.

Coordinator

Handles cross-partition transactions if needed. Coordinates quorum reads/writes.

Database Design

Each storage node uses an LSM-tree (LevelDB, RocksDB) or B-tree. Partition key determines shard. Replication via WAL + async/sync replica. Metadata (partition map) in ZooKeeper or etcd.

API Design

MethodPathDescription
PUT/v1/keys/{key}Store key-value. Body: {value, ttl?}. Returns 200.
GET/v1/keys/{key}Retrieve value. Returns 200 + value or 404.
DELETE/v1/keys/{key}Delete key. Returns 200.
GET/v1/keys?prefix={p}List keys with prefix (if supported).

Scalability & Trade-offs

Related System Designs