Rate limiting algorithms explained for system design interviews and real APIs

Rate limiting protects APIs, gateways, login endpoints, payment flows, and distributed services from overload and abuse. The core design choice is not just the numeric limit; it is the algorithm that decides whether a request is accepted, queued, or rejected under bursty traffic.

Token bucket rate limiter

Token bucket rate limiting stores permission as tokens. Tokens refill at a configured rate and each request consumes one token. The bucket capacity controls burst size, while the refill rate controls the long-term throughput. This is a strong default for public APIs because it allows short bursts without allowing sustained overload.

Leaky bucket rate limiter

Leaky bucket rate limiting uses a finite queue and drains work at a steady rate. It is useful when the protected service needs smooth traffic rather than immediate burst handling. The tradeoff is latency: accepted requests may wait in the bucket before reaching the backend.

Fixed window rate limiter

Fixed window counters divide time into discrete windows and allow a fixed number of requests in each window. They are cheap to implement with a counter and expiry, but can permit a boundary burst when a client sends requests at the end of one window and again at the start of the next.

Sliding window rate limiter

Sliding window rate limiting counts requests over a rolling interval instead of resetting abruptly. A sliding window log stores accepted timestamps and expires them individually, which improves fairness but costs more memory. Large systems often approximate this with sliding window counters.

Token bucket vs leaky bucket vs fixed window vs sliding window

Use token bucket when users should be allowed to burst up to a safe capacity. Use leaky bucket when downstream stability and smooth output matter more than immediate completion. Use fixed window when simplicity and low storage cost matter most. Use sliding window when fairness and precision matter, especially for authentication endpoints, expensive APIs, or abuse-sensitive workloads.

How rate limiter parameters map to production design

Capacity or limit controls either burst size, queue size, or the maximum requests allowed per window.
Rate per second controls token refill for token bucket and queue drain for leaky bucket.
Window seconds controls fixed and sliding window duration; shorter windows react faster, while longer windows smooth behavior.
Input requests per second represents client traffic before the limiter applies any allow, queue, or reject decision.

Distributed rate limiting with Redis

In distributed systems, rate limiter state must be shared across application instances. Common designs use Redis counters for fixed windows, Redis sorted sets for sliding window logs, and atomic Lua scripts to update request counters, token state, or timestamp sets without race conditions. API gateways and service meshes often enforce these limits before requests reach application code.

Rate limiter playground