22

I'm trying to figure out where the latency in my calls is coming from, please let me know if any of this information could be presented in a format that is more clear!

Some background: I have two systems--System A and System B. I manually (through Postman) hit an endpoint on System A that invokes an endpoint on System B. System A is hosted on an EC2 instance.

  • When System B is hosted on a Lambda function behind API Gateway, the latency for the call is 125 ms.
  • When System B is hosted on an EC2 instance, the latency for the call is 8 ms.
  • When System B is hosted on an EC2 instance behind API Gateway, the latency for the call is 100 ms.

So, my hypothesis is that API Gateway is the reason for increased latency when it's paired with the Lambda function as well. Can anyone confirm if this is the case, and if so, what is API Gateway doing that increases the latency so much? Is there any way around it? Thank you!

General Grievance
  • 4,555
  • 31
  • 31
  • 45
danielle
  • 813
  • 2
  • 10
  • 26
  • Interesting observation. I have a theory on a possible cause, but will need some information, to test it. To confirm, the API Gateway instance is in the same region as System A, which is an EC2 instance -- correct? Which region? Do an `nslookup` of the API Gateway endpoint address hostname from the "System A" machine, and mention the IP address(es) you get in response, in a comment. These addresses are shared among many API Gateway endpoints, so revealing them here will not expose any sensitive information as long as you don't also mention your endpoint hostname. – Michael - sqlbot Dec 09 '16 at 10:52

3 Answers3

15

It might not be exactly what the original question asks for, but I'll add a comment about CloudFront.

In my experience, both CloudFront and API Gateway will add at least 100 ms each for every HTTPS request on average - maybe even more.

This is due to the fact that in order to secure your API call, API Gateway enforces SSL in all of its components. This means that if you are using SSL on your backend, that your first API call will have to negotiate 3 SSL handshakes:

  1. Client to CloudFront
  2. CloudFront to API Gateway
  3. API Gateway to your backend

It is not uncommon for these handshakes to take over 100 milliseconds, meaning that a single request to an inactive API could see over 300 milliseconds of additional overhead. Both CloudFront and API Gateway attempt to reuse connections, so over a large number of requests you’d expect to see that the overhead for each call would approach only the cost of the initial SSL handshake. Unfortunately, if you’re testing from a web browser and making a single call against an API not yet in production, you will likely not see this.

In the same discussion, it was eventually clarified what the "large number of requests" should be to actually see that connection reuse:

Additionally, when I meant large, I should have been slightly more precise in scale. 1000 requests from a single source may not see significant reuse, but APIs that are seeing that many per second from multiple sources would definitely expect to see the results I mentioned.

...

Unfortunately, while cannot give you an exact number, you will not see any significant connection reuse until you approach closer to 100 requests per second.

Bear in mind that this is a thread from mid-late 2016, and there should be some improvements already in place. But in my own experience, this overhead is still present and performing a loadtest on a simple API with 2000 rps is still giving me >200 ms extra latency as of 2018.

source: https://forums.aws.amazon.com/thread.jspa?messageID=737224

villasv
  • 6,304
  • 2
  • 44
  • 78
8

Heard from Amazon support on this:

With API Gateway it requires going from the client to API Gateway, which means leaving the VPC and going out to the internet, then back to your VPC to go to your other EC2 Instance, then back to API Gateway, which means leaving your VPC again and then back to your first EC2 instance.

So this additional latency is expected. The only way to lower the latency is to add in API Caching which is only going to be useful is if the content you are requesting is going to be static and not updating constantly. You will still see the longer latency when the item is removed from cache and needs to be fetched from the System, but it will lower most calls.

So I guess the latency is normal, which is unfortunate, but hopefully not something we'll have to deal with constantly moving forward.

danielle
  • 813
  • 2
  • 10
  • 26
  • VPC Endpoints should save some latency : http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html – JLM Sep 21 '17 at 10:16
5

In the direct case (#2) are you using SSL? 8 ms is very fast for SSL, although if it's within an AZ I suppose it's possible. If you aren't using SSL there, then using APIGW will introduce a secure TLS connection between the client and CloudFront which of course has a latency penalty. But usually that's worth it for a secure connection since the latency is only on the initial establishment.

Once a connection is established all the way through, or when the API has moderate, sustained volume, I'd expect the average latency with APIGW to drop significantly. You'll still see the ~100 ms latency when establishing a new connection though.

Unfortunately the use case you're describing (EC2 -> APIGW -> EC2) isn't great right now. Since APIGW is behind CloudFront, it is optimized for clients all over the world, but you will see additional latency when the client is on EC2.

Edit: And the reason why you only see a small penalty when adding Lambda is that APIGW already has lots of established connections to Lambda, since it's a single endpoint with a handful of IPs. The actual overhead (not connection related) in APIGW should be similar to Lambda overhead.

jackko
  • 6,998
  • 26
  • 38