Chat/Messaging System (WhatsApp)

Hard Real-time

Overview

Designing a Chat/Messaging System (WhatsApp, Slack, or iMessage) is a challenging system design question that tests real-time delivery, offline support, and consistency. The core challenges include: delivering messages with low latency, handling millions of concurrent connections (WebSockets/long polling), storing and syncing message history, and supporting group chats with fan-out. This design matters in interviews because it combines WebSockets, message queues, databases, and caching—and requires careful thinking about message ordering, idempotency, and read receipts. Companies like Meta, Google, and Slack build these systems at massive scale, and demonstrating you understand the full flow from sender to recipient shows senior-level systems design skills.

Requirements

Functional

  • Send and receive 1:1 and group messages
  • Message delivery status (sent, delivered, read)
  • Offline message storage and sync when user comes online
  • Message history and search
  • Media attachments (images, files)
  • Typing indicators and presence

Non-Functional

  • Low latency — message delivery <100ms
  • High availability — 99.99% uptime
  • Consistency — messages in order, no duplicates
  • Scalability — millions of concurrent connections

Capacity Estimation

Assume 500M users, 100B messages/day. 1.2M msg/sec. 50K concurrent connections per server. 100B * 1KB = 100TB message storage/year.

Architecture Diagram

ClientsWebSocket GatewayMessage ServiceKafkaMessage StorePresence SvcMedia StoreSync Service

Component Deep Dive

Connection Manager

Maintains WebSocket/long-poll connections. Routes messages to correct connection. Load balanced.

Message Service

Receives messages, validates, stores, publishes to queue. Handles idempotency.

Message Queue

Kafka. Fan-out to online users' connection managers. Persists for offline delivery.

Message Store

Cassandra/Scylla. Stores messages by chat_id, message_id. Supports range queries for history.

Presence Service

Tracks user online/offline status. Redis with heartbeat. Informs connection manager.

Media Store

Object store (S3) for attachments. Messages store URLs.

Sync Service

For offline users: on connect, fetches messages since last_seen. Handles conflict resolution.

Database Design

Messages: chat_id (PK), message_id (CK), sender_id, content, created_at. User_chats: user_id, chat_id, last_read. Cassandra for messages; Redis for presence; MySQL for user metadata.

API Design

MethodPathDescription
POST/api/messagesSend message. Body: {chat_id, content, attachments?}. Returns message_id.
GET/api/chats/{id}/messages?before=&limit=Get message history. Paginated.
POST/api/messages/{id}/readMark as read. Updates read receipt.
GET/api/chatsList user's chats with last message.

Scalability & Trade-offs

Related System Designs