Rate Limiter

Overview

A Rate Limiter system design question tests your understanding of distributed systems, consistency, and real-time decision-making under load. Companies like Stripe, Uber, and Netflix ask this because rate limiting is critical for API protection, fair usage, and cost control. The challenge lies in enforcing limits across distributed servers without a single point of failure, while handling edge cases like sliding windows and token buckets. This design matters in interviews because it combines algorithms (fixed window, sliding window, token bucket, leaky bucket) with systems concepts like Redis, distributed locks, and eventual consistency. Demonstrating knowledge of when to use local vs global limits and how to handle clock skew shows senior-level thinking.

Requirements

Functional

Limit number of requests per user/IP per time window
Support multiple algorithms: fixed window, sliding window, token bucket, leaky bucket
Return 429 Too Many Requests when limit exceeded
Include Retry-After header in 429 response
Support different limits per API endpoint or user tier
Allow whitelisting of certain clients (e.g., internal services)

Non-Functional

Low latency — rate limit check must add <10ms overhead
High availability — rate limiter failure should not block requests (fail open vs fail closed)
Accuracy — minimize false positives/negatives at window boundaries
Scalability — support millions of unique keys (user IDs, IPs)

Capacity Estimation

Assume 10M unique users, 1000 req/s per user peak. Need to check ~10K keys/s across distributed nodes. Redis can handle 100K+ ops/s.

Architecture Diagram

Component Deep Dive

Rate Limit Middleware

Intercepts each request before it reaches the application. Extracts user/key, calls Rate Limit Service, and returns 429 or forwards request.

Rate Limit Service

Implements the chosen algorithm (e.g., sliding window with Redis). Checks current count, increments if under limit, returns allow/deny.

Redis / Distributed Store

Stores key → (count, window_start) or token bucket state. Uses INCR, EXPIRE for fixed window; Lua scripts for sliding window atomicity.

Configuration Service

Stores limit rules per endpoint, user tier, or key prefix. Allows dynamic updates without redeployment.

Metrics & Alerts

Tracks rate limit hits, latency, and error rates. Alerts when limits are too aggressive or Redis is overloaded.

Database Design

Redis is the primary store: key = user_id:endpoint:window, value = count, TTL = window size. For sliding window log, use sorted set with timestamps. No traditional DB needed for core logic; config can live in config service or DB.

API Design

Method	Path	Description
`GET`	/api/check	Internal: Check if request is allowed. Returns 200 + X-RateLimit-Remaining or 429.
`GET`	/api/limits	Get current rate limit status for user (remaining, reset time).
`POST`	/admin/limits	Update rate limit rules (admin only).

Scalability & Trade-offs

Fixed vs sliding window: Fixed is simpler but allows 2x burst at boundaries; sliding is fairer but requires more storage (timestamps).
Local vs distributed: Local counters are fast but inconsistent across servers; Redis provides global consistency with network latency.
Fail open vs fail closed: Fail open avoids blocking users during outages but may allow abuse; fail closed is safer but can cause outages.

Related System Designs

Infrastructure