Distributed Message Queue (Kafka)

Overview

Designing a Distributed Message Queue (Kafka, RabbitMQ) tests your understanding of pub/sub, partitioning, replication, and consumer groups. The core challenges include: ordering guarantees (per-partition), durability (replicated log), and scaling consumers. This design matters in interviews because message queues are the backbone of event-driven architecture—used in notification systems, data pipelines, and microservices. Demonstrating you understand topics, partitions, offsets, consumer groups, and how to achieve exactly-once semantics shows you can design systems for reliable, scalable message delivery. Companies like LinkedIn (Kafka), Uber, and Netflix rely on these systems at massive scale.

Requirements

Functional

Produce messages to topics
Consume messages (push or pull)
Message ordering per partition
Consumer groups for parallel consumption
Retention and replay (read from offset)
Exactly-once semantics (optional)

Non-Functional

High throughput — millions of messages/sec
Durability — replicated, persisted
Low latency — produce ack <5ms
Scalability — add partitions, add consumers

Capacity Estimation

Assume 1M msg/sec, 1KB avg = 1GB/sec ingest. Retention 7 days = 600TB. 1000 partitions per topic. Replication factor 3.

Architecture Diagram

Component Deep Dive

Broker

Stores partitions (log segments). Handles produce/consume. Leader for some partitions, replica for others.

ZooKeeper / KRaft

Cluster metadata, leader election, partition assignment. Kafka is moving to KRaft (no ZooKeeper).

Producer

Sends messages to broker. Specifies partition (key hash or manual). Waits for ack (0, 1, all).

Consumer

Pulls from broker. Consumer group shares partitions. Commits offset for resume.

Log Storage

Append-only log per partition. Segmented by size/time. Replicated across brokers.

Coordinator

Assigns partitions to consumers in group. Handles rebalance on join/leave.

Database Design

No traditional DB. Log segments on disk. Metadata (partition assignment, offsets) in ZooKeeper or KRaft. Consumer offsets in __consumer_offsets topic.

API Design

Method	Path	Description
`POST`	/topics/{topic}/messages	Produce. Body: {key?, value, partition?}. Returns offset.
`GET`	/topics/{topic}/messages	Consume. Query: partition, offset, limit. Returns messages.
`POST`	/consumer/commit	Commit offset. Enables resume after restart.

Scalability & Trade-offs

Pull vs push: Pull gives consumer control (backpressure); push is simpler, lower latency. Kafka uses pull.
Ordering vs parallelism: Ordering per partition; more partitions = more parallelism but no cross-partition order.
Retention: Longer retention enables replay but increases storage. Tiered storage (hot/cold) reduces cost.

Related System Designs

Messaging