Scalability & Performance

Rate Limiting and Throttling: Protecting Systems Under Load

Learn fixed window, sliding window, token bucket, leaky bucket, distributed Redis rate limiting, throttling, quotas, and graceful degradation strategies.

rate limitingtoken bucketthrottlingsliding windowRedis

Why Rate Limiting?

Rate limiting controls how much traffic a user, client, tenant, IP, service, or route can send in a time period. It protects availability, fairness, cost, and downstream systems.

Key idea: Rate limiting is not only for abuse. It is also capacity management and fairness under load.


Rate Limiting vs Throttling

ConceptMeaning
Rate limitingReject or delay requests above a configured rate
ThrottlingIntentionally slow down or reduce service quality
QuotaAllow a fixed amount over a longer period
BackpressureSignal callers to slow down

Fixed Window Counter

Counts requests in fixed time windows.

ProsCons
Simple and fastBoundary bursts
Easy with Redis increment + TTLLess smooth fairness

Boundary burst problem: a client can send 100 requests at 10:00:59 and 100 more at 10:01:00.


Sliding Window Log

Stores timestamps for each request and counts recent entries.

ProsCons
AccurateStores many timestamps
Smooth windowHigher memory and CPU

Sliding logs work for precision but may be expensive at very high traffic.


Sliding Window Counter

Approximates a sliding window by weighting the previous window and current window.

txt
effective_count =
  previous_window_count * overlap_ratio
  + current_window_count
ProsCons
Smoother than fixed windowApproximate
Much cheaper than timestamp logsMore logic than fixed counter

This is a strong default for many API gateways.


Token Bucket

Tokens refill at a fixed rate. Each request consumes a token. Bursts are allowed up to bucket capacity.

ParameterMeaning
Refill rateSustained allowed rate
Bucket sizeMaximum burst
Cost per requestTokens consumed per operation

Token bucket is common because it allows short bursts without sacrificing long-term control.


Leaky Bucket

Requests enter a queue and leave at a steady rate.

ProsCons
Smooths trafficAdds queueing latency
Protects downstreamQueue can hide overload

Use leaky bucket when a downstream system needs steady traffic, not bursts.


Distributed Rate Limiting

In a horizontally scaled system, every instance must share limit state.

Redis Design

Use atomic operations or Lua scripts so increment, expiry, and decision happen together.

txt
key = rate_limit:{tenant}:{user}:{route}:{window}
increment key
set expiry if new
allow if count <= limit
ConcernDesign Choice
AtomicityLua script or transaction
LatencyKeep Redis near gateway
Hot keysPartition by user or tenant
FailureFail open or fail closed by route risk
Clock skewUse Redis server time where possible

Limit Dimensions

DimensionExample
Per IPAnonymous traffic
Per userLogged-in API usage
Per tenantEnterprise fairness
Per routeExpensive endpoints
Per API keyDeveloper platform
GlobalProtect entire service
Downstream-specificProtect database or third-party API

Good systems use layered limits. For example: per-user, per-tenant, route-specific, and global emergency limits.


Response Design

Return clear signals so clients can behave well.

http
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1717305600

Use 429 for client rate limits and 503 for temporary service overload. Include jitter in client retries.


Graceful Degradation

Overload SituationDegradation
Search under loadReturn cached or partial results
Analytics dashboardUse stale data
Recommendation serviceFall back to popular items
Notification fanoutQueue and delay
Expensive exportMove to async job

Rate limiting should protect important paths while allowing non-critical features to degrade.


What to Remember for Interviews

  1. Choose algorithm by behavior: fixed window is simple, token bucket supports bursts, leaky bucket smooths traffic.
  2. Distributed limits need shared state: Redis is common, but atomicity matters.
  3. Limit by multiple dimensions: user, tenant, route, IP, and global.
  4. Return useful 429 responses: include retry information.
  5. Degrade gracefully: protect critical paths and shed optional work.

Practice: Design rate limiting for a public API with free and paid tiers. Include per-key limits, burst limits, Redis state, failure behavior, and client-facing headers.