I was researching rate limiting algorithms for a project when I came across a couple of articles talking about the Generic Cell Rate Algorithm (GCRA).
The name “cell” comes from a communications technology called Asynchronous Transfer Mode (ATM) which encoded data into small packets of fixed size called “cells” – https://brandur.org/rate-limiting
GCRA is a leaky bucket-type scheduling algorithm which works by tracking the remaining limit through a single timestamp value called the theoretical arrival time (TAT), which is seeded on the first request. The TAT represents the next time at which the client will be allowed to make a request – if it’s in the past, their requests are allowed; if it’s in the future, they’ve hit their limit.
For the following example let’s assume that we have a rate limit of 1 request every ten minutes (i.e. 6 requests per hour) with a burst capacity of 6 requests. Our initial request will set a TAT of 50 minutes in the past, which is equal to 5 remaining requests:
If we immediately make another request we’ll simply consume more of our capacity. The new TAT will be calculated by incrementing the previous TAT by 10 minutes to 40 minutes in the past:
If we continue our burst, we’ll exhaust all 6 of our allowed requests and our TAT will be set 10 minutes in the future:
At this point a 7th request will be refused until the request time is at least equal to the current TAT. Once we’ve waited 10 minutes, we make another request, and our new TAT is set to 10 minutes in the future:
If we wait 2 hours, would we be allowed to send a burst of 20 requests? No. The algorithm accounts for this by comparing the existing capacity to the configured burst capacity and removing any excess before assessing the cost of a given request.
In our example, after waiting 2 hours, the previous TAT would first be incremented by 50 minutes to reduce the capacity to the maximum allowed for our bucket (6 requests), then incremented by 10 minutes for the request being made, setting our new TAT to 50 minutes in the past:
The Go package github.com/throttled/throttled implements the core of this algorithm if you want to try it out for yourself.