API Gateway throttling -- burst limit vs rate limit

Question

I can't find any documentation on these two terms. I pored over AWS docs and Google results.

What is the difference between burst limit and rate limit? When I go to change the settings for default route throttling on my API, there are just two number inputs. It doesn't say what unit or time frame these numbers represent. Is it API calls per second? per minute?

https://en.wikipedia.org/wiki/Token_bucket – Matt Timmermans Dec 20 '21 at 14:43 — Matt Timmermans, Dec 20 '21 at 14:43

Tobias Geiselmann · Accepted Answer · 2022-11-05T15:48:55.347

43

The burst limit defines the number of requests your API can handle concurrently. The rate limit defines the number of allowed requests per second. This is an implementation of the Token bucket implementation.

Concurrently means that requests run in parallel. Assuming that one request takes 10ms, you could have 100 request per second with a concurrency of 1, if they were all executed in series. But if they were all executed at the same moment, the concurrency would be 100. In both cases a rate limit of 100 would suffice. In the first case, a burst limit of 1 would allow all requests to succeed, in the second case this would deny 99 requests.

The official documentation only mentions the Token bucket algorithm briefly.

edited Nov 05 '22 at 15:48

answered Dec 20 '21 at 15:00

Tobias Geiselmann

2,139
2
23
36

3

What does "concurrently" mean, in this context, if not "during the same second"? Does it mean a call that starts before a previous one finishes? – Taylor Vance Dec 20 '21 at 15:16
10

Excatly. Concurrently means that requests run in parallel. Assuming that one request takes 10ms, you could have 100 request per second with a concurrency of 1, if they were all executed in series. But if they were all executed at the same moment, the concurrency would be 100. In both cases a rate limit of 100 would suffice. In the first case, a burst limit of 1 would allow all requests to succeed, in the second case this would deny 99 requests. – Tobias Geiselmann Dec 20 '21 at 15:27
3

Thanks for the explanation! Do you know how API Gateway handles these limits? Does it immediately return an error code, or does it wait a bit to see if the request can be handled in the next second or so? The docs say "Clients may receive 429 Too Many Requests error responses at this point" but the "may" makes that statement ambiguous. – Taylor Vance Dec 20 '21 at 16:30
@TobiasGeiselmann: My understanding of how rate limit and burst are being applied by AWS differs from you. I haven't seen any concept of concurrency in AWS documentation when it comes of request throttling. I added an answer below to describe how I believe the token bucket algorithm actually works in AWS for throttling. – sboisse Feb 01 '23 at 20:25

score 12 · Answer 2 · edited Feb 27 '23 at 14:15

My understanding of the rate limit and burst limit differs a bit from what is being explained by Tobias Geiselmann (the most upvoted answer).

I don't think there is any concept of concurrency per se in the way throttling works in API Gateway. Requests just get processed as fast as possible and if your API implementation takes long to process a request, there will just be more concurrent processes executing those requests, and the amount of concurrent processes may very well be way more than the limits you would have set for throttling in API Gateway.

The rate limit determines the maximum amount of requests that can be made before the burst starts taking effect, filling up your "burst bucket". The bucket acts like a FIFO, filling up with tokens as requests are coming, and "emptying" itself from those tokens at the rate you have set as the rate limit.

So if more requests keep coming at a faster rate than the "output" of that bucket, then it will eventually become "full", and then throttling will start to happen with "too many requests" errors.

For example, if you set a limit rate of 10 requests per second (RPS), with a burst limit of 100:

If requests keep coming at 10 RPS or lower, the burst bucket just remains empty. Its input and output are below the set rate limit.
Let's now say the amount of requests is beyond 10 RPS:
- The first second, 18 requests come in. The bucket can output 10 RPS, so 18 - 10 = 8 tokens accumulate in the bucket.
- The second second, 34 more requests come in the bucket. The bucket can still take out 10 RPS, so 34 - 10 = 24 more tokens accumulate in the bucket. The bucket now contains 8 + 24 = 32 tokens.
- The third second, 85 more requests are made, and they are added the bucket. Again 10 requests are taken out. This means 85 - 10 = 75 more tokens accumulate in the bucket. But it had already 32 tokens in there. Because 32 + 75 = 107 is higher than 100, the 7 last requests are throttled and get a "Too many requests" response. The bucket is full and contains 100 tokens.
- The fourth second, 5 more requests come in. The bucket can take out 10 tokens, ending up with 100 + 5 - 10 = 95 tokens. No more throttling happens.
- And so on.

So concurrency is not really relevant here. If the requests take 15 seconds each to execute, you could very well end up with 10 RPS * 15 seconds = 150 concurrent requests even if your set limit is just 10 RPS with a burst limit of 100.

Please note that [ChatGPT is banned](https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt-is-banned) in any and all forms, even if you attribute the content. Since your answer stands on its own even without the CGPT part, I simply rolled back your edit. Please don't revert it. — blackgreen, Feb 27 '23 at 14:16
"So concurrency is not really relevant here. If the requests take 15 seconds each to execute, you could very well end up with 10 RPS * 15 seconds = 150 concurrent requests even if your set limit is just 10 RPS with a burst limit of 100." -> this part sounds wrong. each 10 request will remove 10 tokens from the bucket and this wont be renewed until they are finished. After 10 seconds, there will be no tokens left in the bucket and the last 50 seconds that would come in the last 5 seconds will have 429 — iRestMyCaseYourHonor, Aug 03 '23 at 06:42
@iRestMyCaseYourHonor: No, this is incorrect. Tokens do not become restituted when a lambda finishes executing. Tokens get added at a fixed rate per second, completely independently of if a lambda has finished exeucting or not. It can't even really make sense it works this way. An API Gateway request does not even necessarily trigger a lambda execution. — sboisse, Aug 23 '23 at 20:22

Ivan Samygin · Answer 3 · 2023-03-15T00:27:27.970

I'd like to bring an analogy to explain these two limits.

Imagine an entertainment in the kids center - a ball-gun machine. A kid operating the machine has ability to rotate the machine by 360 and pull the trigger to make a shot (it will continue shooting automatically until the trigger is released). The ball-gun has a basket with balls and a sensor on the top of it, so whenever there's an empty slot, a new ball is sent to the basket automatically, but only one-by-one, making sure the ball reached the basket. Basket has capacity of X balls, and the speed of a ball reaching the basket (whenever a sensor detected an empty slot in the basket) is Y balls per period of time (let's say 10 seconds). The shooting speed is amazing - an entire basket can be released in a moment.

Whenever there's at least one ball in the basket the gun can shoot.
Whenever an empty slot is detected by sensor a new ball is sent.
When the basket is full and no shots are made, nothing happens.
When the basket is full and a kid starts shooting without releasing the trigger, all X balls from the basket are shot in a moment, and then a new shot is made every time a ball is in the basket (Y shots every 10 seconds).
When the basket is emptied first, but then shots are done slower than machine reloading speed, the number of balls in the basket keeps growing until the basket is full, and after that a new ball is sent only after a new shot is made.

So burst limit of API gateway is similar to the basket capacity X - how many requests can be accepted at once, when all tokens are available. And the rate limit is similar to Y - how quickly the capacity is reloaded (how many tokens are added in a period of time - a second in case of APIGW - when there's a free slot). A limit rate of 1000 means that a token is added every 1/1000 second (1ms). A new APIGW incoming request corresponds to an event of pulling a trigger: when the basket is empty - no shot is done (a connection is refused by APIGW), when there's a ball in the basket the gun shoots (APIGW accepts connection).

It's important to understand that the number of request that can be accepted in a next period of time depends only on the number of tokens left (how many balls are still in the basket of the ball-gun machine) and the reload speed (how many balls will be added during that next period). This number is completely unrelated to the state of already accepted connection (no additional tokens are added when connection is closed; tokens are being added at a constant speed whenever total token number is less than burst limit). Burst limit and rate limit control only the rate of accepting new connections, not the number of concurrent connections. The X and Y in ball-gun analogy control how fast you can shoot regardless of hitting the target. This article helped me to understand that. If you really need to control the number of opened connections (concurrent requests), you should check other possibilities - they are described in the article as well.

|ooooooooooooooooooooooooooooo|  rate limit - the speed of reloading
|ooooooooooooooooooooooooooooo|  the busket of the ball-gun
|ooooooooooooooooooooooooooooo|  - how fast a ball is added
|ooooooooooooooooooooooooooooo|  to the busket when empty slot
|ooooooooooooooooooooooooooooo|  is detected
|----------------------------\ \
               ______________/ / 
               _______________/
          \   /
         >>| |    <- an empty slot sensor
           | |
          /   \
         |  o  |  burst limit - the capacity of ball-gun busket
         |ooooo|
        _|_____ \_________
       0_|||||||----------  
     /||\==|||
      /\   |||

score 0 · Answer 4 · answered Oct 22 '22 at 23:54

0

there are three "numbers" to set: Throttling:

Rate: maximum number of requests per second
Burst: maximum number of requests per second in parallel (simultaneously) Quota
Enable Quota: maximum request per month

answered Oct 22 '22 at 23:54

Manuel

9
1

score 0 · Answer 5 · answered Aug 03 '23 at 06:47

Lets consider the case where we have 50 burst and 25 rate limit and we have 50 seconds coming each second with 2 seconds of processing time:

At time t0: 50 requests come in, 25 are processed (start to be processed) and the rest are queued up. The bucket has 0 tokens left, as it's been exhausted by the initial burst of requests.

At time t1: 50 more requests come in. However, there are no tokens left in the bucket from the previous second, so these requests are responded with a 429 error. At the same time, the first 25 requests from t0 are still being processed.

At time t2: 50 more requests come in. But the bucket has been replenished with 25 tokens, so it can accommodate 25 new requests, but the remaining 25 requests will get a 429 error. The first 25 requests from t0 have now finished processing, and the bucket starts to refill.

At time t3: 50 more requests come in. The bucket has been replenished by another 25 tokens, so it can again accommodate 25 new requests, but the remaining 25 requests will get a 429 error.

And so on. One needs to take into account the processing time.

API Gateway throttling -- burst limit vs rate limit

5 Answers5

Linked