0

I´m consistently being charged for a surprisingly high amount of data transfer out (from Amazon to Internet). I looked into the Usage Reports of the past few months and found out that the Data Transfer Out was coming out of an Application Load Balancer (ALB) between the Internet and multiple nodes of my application (internal IPs). Also noticed that DataTransfer-Out-Bytes is very close to the DataTransfer-In-Bytes in the same load balancer, which is weird (coincidence?). I was expecting the response to each request to be way smaller than the request itself. So, I enabled flow logs in the ALB for a few minutes and found out the following:

  • Requests coming from the Internet (public IPs) in to ALB = ~0.47 GB;
  • Requests coming from ALB to application servers in the same availability zone = ~0.47 GB - ALB simply passing requests through to application servers, as expected. So, about the same amount of traffic.
  • Responses from application servers back into the same ALB = ~0.04 GB – As expected, responses generate way less traffic back into ALB. Usually a 1K request gets a simple “HTTP 200 OK” response.
  • Responses from ALB back to the external IP addresses => ~0.43 GB – this was mind-blowing. I was expecting ~0.04GB, the same amount received from the application servers.

Unfortunately, ALB does not allow me to use packet sniffers (e.g. tcpdump) to see that is actually coming in and out. Is there anything I´m missing? Any help will be much appreciated. Thanks in advance!

Ricardo.

Ricardo F
  • 1
  • 1
  • Data Transfer Out is mystical. I hope you figure out what is going on. – Asdfg Mar 28 '20 at 20:49
  • 1
    Does the ALB accept public requests from the internet? Our Beanstalk app servers sometimes get hammered by bots trying to hack them with things like requesting WordPress admin pages (the servers are not running WP). The point being, 1K is for a valid response using the correct endpoint while the ALB could be receiving anything, perhaps something that increases the outgoing bytes beyond the server response? (That's theoretical, I have no idea what request - response pair that would be.) – Dave S Mar 28 '20 at 21:07
  • 2
    @RicardoF you made no mention of TLS. The balancer sends the certificate and chain to the browser with each connection, which is standard behavior of any web endpoint that speaks HTTPS. – Michael - sqlbot Mar 29 '20 at 01:45
  • 1
    @Michael-sqlbot, good point! TLS/SSL handshakes and certs should add a few KB of data in each session. This may explain the whole thing.Thank you! Did some research on this and it seems that TLS session resumption may a good way to avoid a full TLS handshake. If session state info could be retained in the client side for many hours it would be great. But still not sure if TLS session resumption requires any configuration on the server side and/or client side. Will check and update this thread. Am I heading in the right direction? Any other thoughts are very welcome. Thanks. – Ricardo F Mar 29 '20 at 22:05
  • @DaveS, thanks for your response. Yes, ALB does accept public requests from the internet, but over 99% of the requests are valid ones. Although there are some random bots every now and then, that wouldn´t explain the massive traffic going out of ALB. – Ricardo F Mar 29 '20 at 22:24

1 Answers1

0

I believe the next step in your investigation would be to enable ALB access logs and see whether you can correlate the "sent_bytes" in the ALB access log to either your Flow log or your bill.

For information on ALB access logs see: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html

There is more than one way to analyze the ALB access logs, but I've always been happy to use Athena, please see: https://aws.amazon.com/premiumsupport/knowledge-center/athena-analyze-access-logs/

TelamonAegisthus
  • 267
  • 2
  • 11
  • Thanks. Will do. – Ricardo F Mar 29 '20 at 22:16
  • [UPDATE] I have turned ALB Access Logs on. It's not displaying this huge amount of Data Transfer OUT. It actually shows a pretty low amount of data out - only around 2.1% the total amount of Data Transfer IN in the same timeframe. Wondering if the traffic generated due to TLS handshake and certs is considered in the sent_bytes field. Thanks. – Ricardo F Apr 07 '20 at 12:06
  • I'd raise a support case with AWS. There is a possibility that your ALB is getting hit by a trawler or a DOS attack that is causing you useless traffic. – TelamonAegisthus Apr 08 '20 at 22:38