Distributed Message Queue (Kafka)

Hard Messaging

Overview

Designing a Distributed Message Queue (Kafka, RabbitMQ) tests your understanding of pub/sub, partitioning, replication, and consumer groups. The core challenges include: ordering guarantees (per-partition), durability (replicated log), and scaling consumers. This design matters in interviews because message queues are the backbone of event-driven architecture—used in notification systems, data pipelines, and microservices. Demonstrating you understand topics, partitions, offsets, consumer groups, and how to achieve exactly-once semantics shows you can design systems for reliable, scalable message delivery. Companies like LinkedIn (Kafka), Uber, and Netflix rely on these systems at massive scale.

Requirements

Functional

  • Produce messages to topics
  • Consume messages (push or pull)
  • Message ordering per partition
  • Consumer groups for parallel consumption
  • Retention and replay (read from offset)
  • Exactly-once semantics (optional)

Non-Functional

  • High throughput — millions of messages/sec
  • Durability — replicated, persisted
  • Low latency — produce ack <5ms
  • Scalability — add partitions, add consumers

Capacity Estimation

Assume 1M msg/sec, 1KB avg = 1GB/sec ingest. Retention 7 days = 600TB. 1000 partitions per topic. Replication factor 3.

Architecture Diagram

ProducersBroker 1Broker 2Broker 3Log StorageReplicationZooKeeperConsumers

Component Deep Dive

Broker

Stores partitions (log segments). Handles produce/consume. Leader for some partitions, replica for others.

ZooKeeper / KRaft

Cluster metadata, leader election, partition assignment. Kafka is moving to KRaft (no ZooKeeper).

Producer

Sends messages to broker. Specifies partition (key hash or manual). Waits for ack (0, 1, all).

Consumer

Pulls from broker. Consumer group shares partitions. Commits offset for resume.

Log Storage

Append-only log per partition. Segmented by size/time. Replicated across brokers.

Coordinator

Assigns partitions to consumers in group. Handles rebalance on join/leave.

Database Design

No traditional DB. Log segments on disk. Metadata (partition assignment, offsets) in ZooKeeper or KRaft. Consumer offsets in __consumer_offsets topic.

API Design

MethodPathDescription
POST/topics/{topic}/messagesProduce. Body: {key?, value, partition?}. Returns offset.
GET/topics/{topic}/messagesConsume. Query: partition, offset, limit. Returns messages.
POST/consumer/commitCommit offset. Enables resume after restart.

Scalability & Trade-offs

Related System Designs