Rate limiter playground

One clock, one event stream, and traffic patterns readers can actually provoke.

Token Bucket
Idle time stores tokens, so the first burst is graceful; sustained excess still gets rejected.
generated
0
accepted
0
rejected
0
accept
clients
limiter
live state
Token Bucket
t = 0.0s
tokens: 8.00 / 8
refill: 4/s
burst headroom: 8
Why the last decisions happened
Send traffic to make the limiter speak.
last 10 simulated seconds0.0s
-10s
-8s
-6s
-4s
-2s
-0s
How this algorithm really works
Mental model

A token bucket is a savings account for request permission. Tokens arrive continuously. A request is accepted only when it can spend one token. If traffic is quiet, tokens accumulate up to the bucket capacity, which is why the algorithm allows controlled bursts without violating the long-run rate.

Decision math
tokens(t) = min(8, tokens(t-Δt) + 4 × Δt). A request is allowed when tokens ≥ 1; then tokens := tokens - 1. Otherwise it is rejected immediately.
System design notes
  • Great for public APIs where short bursts are acceptable but sustained overload is not.
  • Usually implemented with {tokens, last_refill_timestamp}; no background job is required.
  • In distributed systems, the state is commonly stored in Redis and updated atomically with Lua or server-side scripts.
Traffic console
Run patterns or inject bursts manually.
Rule: allow if tokens ≥ 1, then subtract 1; refill by rate × Δt up to capacity
Try this
Pause, wait for a full bucket, then send a burst. Resume with high input to watch spare burst budget turn into steady rejection.

Rate limiting algorithms explained for system design interviews and real APIs

Rate limiting protects APIs, gateways, login endpoints, payment flows, and distributed services from overload and abuse. The core design choice is not just the numeric limit; it is the algorithm that decides whether a request is accepted, queued, or rejected under bursty traffic.

Token bucket rate limiter

Token bucket rate limiting stores permission as tokens. Tokens refill at a configured rate and each request consumes one token. The bucket capacity controls burst size, while the refill rate controls the long-term throughput. This is a strong default for public APIs because it allows short bursts without allowing sustained overload.

Leaky bucket rate limiter

Leaky bucket rate limiting uses a finite queue and drains work at a steady rate. It is useful when the protected service needs smooth traffic rather than immediate burst handling. The tradeoff is latency: accepted requests may wait in the bucket before reaching the backend.

Fixed window rate limiter

Fixed window counters divide time into discrete windows and allow a fixed number of requests in each window. They are cheap to implement with a counter and expiry, but can permit a boundary burst when a client sends requests at the end of one window and again at the start of the next.

Sliding window rate limiter

Sliding window rate limiting counts requests over a rolling interval instead of resetting abruptly. A sliding window log stores accepted timestamps and expires them individually, which improves fairness but costs more memory. Large systems often approximate this with sliding window counters.

Token bucket vs leaky bucket vs fixed window vs sliding window

Use token bucket when users should be allowed to burst up to a safe capacity. Use leaky bucket when downstream stability and smooth output matter more than immediate completion. Use fixed window when simplicity and low storage cost matter most. Use sliding window when fairness and precision matter, especially for authentication endpoints, expensive APIs, or abuse-sensitive workloads.

How rate limiter parameters map to production design

  • Capacity or limit controls either burst size, queue size, or the maximum requests allowed per window.
  • Rate per second controls token refill for token bucket and queue drain for leaky bucket.
  • Window seconds controls fixed and sliding window duration; shorter windows react faster, while longer windows smooth behavior.
  • Input requests per second represents client traffic before the limiter applies any allow, queue, or reject decision.

Distributed rate limiting with Redis

In distributed systems, rate limiter state must be shared across application instances. Common designs use Redis counters for fixed windows, Redis sorted sets for sliding window logs, and atomic Lua scripts to update request counters, token state, or timestamp sets without race conditions. API gateways and service meshes often enforce these limits before requests reach application code.