0

I am using varnish 4 + nginx + ELB. When I run varnishlog I keep seeing new sessions like the following every 2 seconds:

*   << Session  >> 65622     
-   Begin          sess 0 HTTP/1
-   SessOpen       10.90.148.245 16560 :80 10.13.12.136 80 1476955364.127661 17
-   SessClose      RX_TIMEOUT 5.005
-   End      

When I remove the varnish server from the ELB I don't see these sessions. Where are these sessions come from? (the ELB health check interval is 300 seconds)

I found this issue after investigating why the ELB is kicking the instance away after couple of days.

guyyug
  • 897
  • 1
  • 11
  • 23
  • What's weird is that nothing is done in your session. The log meant that `10.90.148.245` is opening a HTTP/1.0 session on your varnish. Then does nothing and the session finally closes when it times out. Is `10.90.148.245` your ELB? – Benjamin Baumann Oct 20 '16 at 10:21
  • There are session from two IPs. The one above and this one: 10.104.144.137. They are not the ELB IP addresses. – guyyug Oct 20 '16 at 11:54
  • Just out of curiosity, how do you look for these sessions in the ec2 servers (when varnish is not the backend). If you look at http logs you won't see them as no requests are done... – Benjamin Baumann Oct 20 '16 at 13:26
  • I meant that I don't see these session when the varnish server is not under ELB (I just edited the description) – guyyug Oct 20 '16 at 13:34
  • 1
    Don't you have any hint on which service lies behind these ips? It could be the two ELB services (one per AZ). These are private IPs, that should be some AWS stuff doing its job. Internal AWS stuff way of doing is not documented so it's difficult to know what it is. It's frustrating but we have to deal with it (I had similar problems with opened sockets closing when inactive in AWS...) – Benjamin Baumann Oct 20 '16 at 13:54
  • It might be it. How did you solve that issue with the open sockets? I had crazy amount of sockets open and I believe that this what caused the server to stop responding (I got 499 response from nginx) – guyyug Oct 20 '16 at 14:07
  • As I was opening the socket in my code, I was entirely master of it. I make a TCP Keep Alive packet go through the socket every 4 minutes. If no trafic is done during 5 minutes, AWS was dropping the connexion (but without telling me so) : the next time I try to send a packet I receive a extremely quick (<1ms) RST response from the IP I was connected to . This response clearly does not come from this IP, it's AWS telling me to reset the socket... – Benjamin Baumann Oct 20 '16 at 16:07

1 Answers1

2

They are not the ELB IP addresses.

Are you sure?

Each node in an ELB -- usually there are two or three nodes in a low traffic environment -- has two IP addresses, a public address and a private address.

Find the IP addresses in question in the EC2 console, under Network & Security > Network Interfaces. You should find the "Attachment owner" set to amazon-elb and "Description" set to the name of the balancer.

If it's really not your ELB, then this should tell you what it is.

If it is your ELB, then these are almost certainly "spare" connections that the ELB is trying to hold open to your instance for performance reasons -- to avoid the wait to set up a new connection when the next client request arrives.

Your varnish setup is closing them rather quickly, so ELB tries again.

You should be able to increase the req_timeout to something larger than the ELB's idle timeout (default 60s) and this should result in you seeing a lot fewer of these.

This advice would be different if Varnish (or any other web server) were directly exposed to the Internet, because you won't want random browsers tying up resources... but in the case of ELB in HTTP/S (not TCP) mode, ELB takes care of managing idle persistent connections from browsers without each one consuming a connection to your instance, and there is no 1:1 correlation between connections on the front side and back side of the ELB.

Michael - sqlbot
  • 169,571
  • 25
  • 353
  • 427
  • Thanks for this. I changed the timeout_idle and now I see this log every 60 seconds. Maybe you can shed light on this: http://stackoverflow.com/questions/40200938/varnish-nginx-elb-499-responses – guyyug Oct 23 '16 at 08:22