Rate Limiting and Throttling: Protecting Systems Under Load

Learn fixed window, sliding window, token bucket, leaky bucket, distributed Redis rate limiting, throttling, quotas, and graceful degradation strategies.

rate limitingtoken bucketthrottlingsliding windowRedis

Why Rate Limiting?

Rate limiting controls how much traffic a user, client, tenant, IP, service, or route can send in a time period. It protects availability, fairness, cost, and downstream systems.

✅

Key idea: Rate limiting is not only for abuse. It is also capacity management and fairness under load.

Rate Limiting vs Throttling

Concept	Meaning
Rate limiting	Reject or delay requests above a configured rate
Throttling	Intentionally slow down or reduce service quality
Quota	Allow a fixed amount over a longer period
Backpressure	Signal callers to slow down

Fixed Window Counter

Counts requests in fixed time windows.

Pros	Cons
Simple and fast	Boundary bursts
Easy with Redis increment + TTL	Less smooth fairness

Boundary burst problem: a client can send 100 requests at 10:00:59 and 100 more at 10:01:00.

Sliding Window Log

Stores timestamps for each request and counts recent entries.

Pros	Cons
Accurate	Stores many timestamps
Smooth window	Higher memory and CPU

Sliding logs work for precision but may be expensive at very high traffic.

Sliding Window Counter

Approximates a sliding window by weighting the previous window and current window.

txt

effective_count =
  previous_window_count * overlap_ratio
  + current_window_count

Pros	Cons
Smoother than fixed window	Approximate
Much cheaper than timestamp logs	More logic than fixed counter

This is a strong default for many API gateways.

Token Bucket

Tokens refill at a fixed rate. Each request consumes a token. Bursts are allowed up to bucket capacity.

Parameter	Meaning
Refill rate	Sustained allowed rate
Bucket size	Maximum burst
Cost per request	Tokens consumed per operation

Token bucket is common because it allows short bursts without sacrificing long-term control.

Leaky Bucket

Requests enter a queue and leave at a steady rate.

Pros	Cons
Smooths traffic	Adds queueing latency
Protects downstream	Queue can hide overload

Use leaky bucket when a downstream system needs steady traffic, not bursts.

Distributed Rate Limiting

In a horizontally scaled system, every instance must share limit state.

Redis Design

Use atomic operations or Lua scripts so increment, expiry, and decision happen together.

txt

key = rate_limit:{tenant}:{user}:{route}:{window}
increment key
set expiry if new
allow if count <= limit

Concern	Design Choice
Atomicity	Lua script or transaction
Latency	Keep Redis near gateway
Hot keys	Partition by user or tenant
Failure	Fail open or fail closed by route risk
Clock skew	Use Redis server time where possible

Limit Dimensions

Dimension	Example
Per IP	Anonymous traffic
Per user	Logged-in API usage
Per tenant	Enterprise fairness
Per route	Expensive endpoints
Per API key	Developer platform
Global	Protect entire service
Downstream-specific	Protect database or third-party API

Good systems use layered limits. For example: per-user, per-tenant, route-specific, and global emergency limits.

Response Design

Return clear signals so clients can behave well.

http

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1717305600

Use 429 for client rate limits and 503 for temporary service overload. Include jitter in client retries.

Graceful Degradation

Overload Situation	Degradation
Search under load	Return cached or partial results
Analytics dashboard	Use stale data
Recommendation service	Fall back to popular items
Notification fanout	Queue and delay
Expensive export	Move to async job

Rate limiting should protect important paths while allowing non-critical features to degrade.

What to Remember for Interviews

Choose algorithm by behavior: fixed window is simple, token bucket supports bursts, leaky bucket smooths traffic.
Distributed limits need shared state: Redis is common, but atomicity matters.
Limit by multiple dimensions: user, tenant, route, IP, and global.
Return useful 429 responses: include retry information.
Degrade gracefully: protect critical paths and shed optional work.

✅

Practice: Design rate limiting for a public API with free and paid tiers. Include per-key limits, burst limits, Redis state, failure behavior, and client-facing headers.

Database Scaling Patterns: Replicas, Shards, Pools, and Query Tuning

Observability and Monitoring: Logs, Metrics, Traces, SLOs, and Alerts