Notification System
Overview
A Notification System design question evaluates your ability to design for high fan-out, multiple channels (push, email, SMS, in-app), and reliability at scale. Companies like Facebook, Twitter, and Slack send billions of notifications daily. The core challenges are: delivering to millions of users quickly, supporting multiple delivery channels, handling failures and retries, and respecting user preferences. This design matters in interviews because it combines message queues, worker pools, idempotency, and event-driven architecture—all concepts that appear in real production systems. Showing you understand push notification infrastructure (FCM, APNs) and how to avoid duplicate or lost notifications demonstrates senior-level systems knowledge.
Requirements
Functional
- Send notifications via push, email, SMS, in-app
- Support templates with variable substitution
- User preference management (opt-in/opt-out per channel)
- Delivery status tracking and retries
- Scheduled and batch notifications
Non-Functional
- Low latency — deliver within seconds
- High throughput — millions of notifications per day
- Reliability — no lost notifications; at-least-once delivery
- Idempotency — duplicate events should not double-send
Capacity Estimation
Assume 100M users, 10 notifications/user/day = 1B/day. Peak 50K/sec. Push: 70%, Email: 20%, SMS: 10%.
Architecture Diagram
Component Deep Dive
Notification API
Accepts notification requests. Validates, enriches with user prefs, publishes to queue.
Message Queue
Kafka or SQS. Decouples producers from workers. Enables retries and backpressure.
Worker Pool
Consumes from queue, routes to channel-specific handlers. Scales horizontally.
Channel Services
Push (FCM/APNs), Email (SendGrid), SMS (Twilio). Each has rate limits and retry logic.
User Preference Store
Stores opt-in/opt-out per channel. Checked before sending.
Delivery Tracker
Records delivery status, supports idempotency keys. Used for analytics and retries.
Database Design
PostgreSQL or Cassandra for user preferences, templates, delivery logs. Schema: notifications (id, user_id, channel, template_id, status, created_at). Use Redis for idempotency cache.
API Design
| Method | Path | Description |
|---|---|---|
POST | /api/v1/notify | Send notification. Body: {user_id, channel, template_id, data}. Returns 202 Accepted. |
GET | /api/v1/notifications/{id} | Get delivery status. |
PUT | /api/v1/users/{id}/preferences | Update notification preferences. |
Scalability & Trade-offs
- At-least-once vs exactly-once: At-least-once is simpler; exactly-once requires idempotency keys and deduplication.
- Sync vs async: Async (queue) scales better; sync is simpler for low volume but blocks caller.
- Fan-out: Per-user queues scale but add complexity; single queue with partitioning is simpler.
Related System Designs
Distributed Message Queue (Kafka)
Designing a Distributed Message Queue (Kafka, RabbitMQ) tests your understanding of pub/sub, partitioning, replication, ...
InfrastructureURL Shortener (TinyURL)
The URL Shortener (TinyURL-style) system design is a classic interview question that tests your understanding of distrib...
InfrastructureRate Limiter
A Rate Limiter system design question tests your understanding of distributed systems, consistency, and real-time decisi...