Rate Limiter
Overview
A Rate Limiter system design question tests your understanding of distributed systems, consistency, and real-time decision-making under load. Companies like Stripe, Uber, and Netflix ask this because rate limiting is critical for API protection, fair usage, and cost control. The challenge lies in enforcing limits across distributed servers without a single point of failure, while handling edge cases like sliding windows and token buckets. This design matters in interviews because it combines algorithms (fixed window, sliding window, token bucket, leaky bucket) with systems concepts like Redis, distributed locks, and eventual consistency. Demonstrating knowledge of when to use local vs global limits and how to handle clock skew shows senior-level thinking.
Requirements
Functional
- Limit number of requests per user/IP per time window
- Support multiple algorithms: fixed window, sliding window, token bucket, leaky bucket
- Return 429 Too Many Requests when limit exceeded
- Include Retry-After header in 429 response
- Support different limits per API endpoint or user tier
- Allow whitelisting of certain clients (e.g., internal services)
Non-Functional
- Low latency — rate limit check must add <10ms overhead
- High availability — rate limiter failure should not block requests (fail open vs fail closed)
- Accuracy — minimize false positives/negatives at window boundaries
- Scalability — support millions of unique keys (user IDs, IPs)
Capacity Estimation
Assume 10M unique users, 1000 req/s per user peak. Need to check ~10K keys/s across distributed nodes. Redis can handle 100K+ ops/s.
Architecture Diagram
Component Deep Dive
Rate Limit Middleware
Intercepts each request before it reaches the application. Extracts user/key, calls Rate Limit Service, and returns 429 or forwards request.
Rate Limit Service
Implements the chosen algorithm (e.g., sliding window with Redis). Checks current count, increments if under limit, returns allow/deny.
Redis / Distributed Store
Stores key → (count, window_start) or token bucket state. Uses INCR, EXPIRE for fixed window; Lua scripts for sliding window atomicity.
Configuration Service
Stores limit rules per endpoint, user tier, or key prefix. Allows dynamic updates without redeployment.
Metrics & Alerts
Tracks rate limit hits, latency, and error rates. Alerts when limits are too aggressive or Redis is overloaded.
Database Design
Redis is the primary store: key = user_id:endpoint:window, value = count, TTL = window size. For sliding window log, use sorted set with timestamps. No traditional DB needed for core logic; config can live in config service or DB.
API Design
| Method | Path | Description |
|---|---|---|
GET | /api/check | Internal: Check if request is allowed. Returns 200 + X-RateLimit-Remaining or 429. |
GET | /api/limits | Get current rate limit status for user (remaining, reset time). |
POST | /admin/limits | Update rate limit rules (admin only). |
Scalability & Trade-offs
- Fixed vs sliding window: Fixed is simpler but allows 2x burst at boundaries; sliding is fairer but requires more storage (timestamps).
- Local vs distributed: Local counters are fast but inconsistent across servers; Redis provides global consistency with network latency.
- Fail open vs fail closed: Fail open avoids blocking users during outages but may allow abuse; fail closed is safer but can cause outages.
Related System Designs
URL Shortener (TinyURL)
The URL Shortener (TinyURL-style) system design is a classic interview question that tests your understanding of distrib...
StorageKey-Value Store
Designing a distributed Key-Value Store (like Redis or DynamoDB) is a staple system design question at companies buildin...
MessagingNotification System
A Notification System design question evaluates your ability to design for high fan-out, multiple channels (push, email,...