Rate Limiting and Throttling: Protecting Systems Under Load
Learn fixed window, sliding window, token bucket, leaky bucket, distributed Redis rate limiting, throttling, quotas, and graceful degradation strategies.
Why Rate Limiting?
Rate limiting controls how much traffic a user, client, tenant, IP, service, or route can send in a time period. It protects availability, fairness, cost, and downstream systems.
Key idea: Rate limiting is not only for abuse. It is also capacity management and fairness under load.
Rate Limiting vs Throttling
| Concept | Meaning |
|---|---|
| Rate limiting | Reject or delay requests above a configured rate |
| Throttling | Intentionally slow down or reduce service quality |
| Quota | Allow a fixed amount over a longer period |
| Backpressure | Signal callers to slow down |
Fixed Window Counter
Counts requests in fixed time windows.
| Pros | Cons |
|---|---|
| Simple and fast | Boundary bursts |
| Easy with Redis increment + TTL | Less smooth fairness |
Boundary burst problem: a client can send 100 requests at 10:00:59 and 100 more at 10:01:00.
Sliding Window Log
Stores timestamps for each request and counts recent entries.
| Pros | Cons |
|---|---|
| Accurate | Stores many timestamps |
| Smooth window | Higher memory and CPU |
Sliding logs work for precision but may be expensive at very high traffic.
Sliding Window Counter
Approximates a sliding window by weighting the previous window and current window.
effective_count =
previous_window_count * overlap_ratio
+ current_window_count
| Pros | Cons |
|---|---|
| Smoother than fixed window | Approximate |
| Much cheaper than timestamp logs | More logic than fixed counter |
This is a strong default for many API gateways.
Token Bucket
Tokens refill at a fixed rate. Each request consumes a token. Bursts are allowed up to bucket capacity.
| Parameter | Meaning |
|---|---|
| Refill rate | Sustained allowed rate |
| Bucket size | Maximum burst |
| Cost per request | Tokens consumed per operation |
Token bucket is common because it allows short bursts without sacrificing long-term control.
Leaky Bucket
Requests enter a queue and leave at a steady rate.
| Pros | Cons |
|---|---|
| Smooths traffic | Adds queueing latency |
| Protects downstream | Queue can hide overload |
Use leaky bucket when a downstream system needs steady traffic, not bursts.
Distributed Rate Limiting
In a horizontally scaled system, every instance must share limit state.
Redis Design
Use atomic operations or Lua scripts so increment, expiry, and decision happen together.
key = rate_limit:{tenant}:{user}:{route}:{window}
increment key
set expiry if new
allow if count <= limit
| Concern | Design Choice |
|---|---|
| Atomicity | Lua script or transaction |
| Latency | Keep Redis near gateway |
| Hot keys | Partition by user or tenant |
| Failure | Fail open or fail closed by route risk |
| Clock skew | Use Redis server time where possible |
Limit Dimensions
| Dimension | Example |
|---|---|
| Per IP | Anonymous traffic |
| Per user | Logged-in API usage |
| Per tenant | Enterprise fairness |
| Per route | Expensive endpoints |
| Per API key | Developer platform |
| Global | Protect entire service |
| Downstream-specific | Protect database or third-party API |
Good systems use layered limits. For example: per-user, per-tenant, route-specific, and global emergency limits.
Response Design
Return clear signals so clients can behave well.
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1717305600
Use 429 for client rate limits and 503 for temporary service overload. Include jitter in client retries.
Graceful Degradation
| Overload Situation | Degradation |
|---|---|
| Search under load | Return cached or partial results |
| Analytics dashboard | Use stale data |
| Recommendation service | Fall back to popular items |
| Notification fanout | Queue and delay |
| Expensive export | Move to async job |
Rate limiting should protect important paths while allowing non-critical features to degrade.
What to Remember for Interviews
- Choose algorithm by behavior: fixed window is simple, token bucket supports bursts, leaky bucket smooths traffic.
- Distributed limits need shared state: Redis is common, but atomicity matters.
- Limit by multiple dimensions: user, tenant, route, IP, and global.
- Return useful 429 responses: include retry information.
- Degrade gracefully: protect critical paths and shed optional work.
Practice: Design rate limiting for a public API with free and paid tiers. Include per-key limits, burst limits, Redis state, failure behavior, and client-facing headers.