I have a strange problem with varnish, it's in front of an API and it's caching the whole responses. It mostly works fine, but from time to time a request will take 5 seconds (or rarely 10 seconds, or 15 seconds .. always an increment of 5) more than usual to return.
I've tried bypassing the HAProxy in front, same, and I checked, it does that wether the URL is already cached or not (I've checked the Age header). So it can't be the backend since the page is in cache, it's not what's in front of varnish, that leaves only varnish itself as the cause of that problem.
Any idea as to what could cause that 5 seconds delay ? I've checked varnishlog, during that delay varnish isn't doing anything. I've also tried manually making another request during that delay, and varnish answered fine so it's not frozen or anything, it works fine. And at the end of that 5 seconds, it's outputting the log for the request as usual, nothing weird in it. Example :
- << Request >> 132712
- Begin req 132711 rxreq
- Timestamp Start: 1499701302.309413 0.000000 0.000000
- Timestamp Req: 1499701302.309413 0.000000 0.000000
- ReqStart 127.0.0.1 43955
- ReqMethod GET
- ReqURL /url
- ReqProtocol HTTP/1.1
- ReqHeader User-Agent: curl/7.38.0
- ReqHeader Host: host
- ReqHeader Accept: /
- ReqHeader X-Forwarded-Proto: https
- ReqHeader X-Forwarded-For: ip
- ReqHeader Connection: close
- ReqUnset X-Forwarded-For: ip
- ReqHeader X-Forwarded-For: ip, 127.0.0.1
- VCL_call RECV
- ReqUnset X-Forwarded-For: ip, 127.0.0.1
- ReqHeader X-Forwarded-For: ip, 127.0.0.1, 127.0.0.1
- VCL_return hash
- VCL_call HASH
- VCL_return lookup
- Hit 2147582482
- VCL_call HIT
- VCL_return deliver
- RespProtocol HTTP/1.1
- RespStatus 200
- RespReason OK
- RespHeader Date: Mon, 10 Jul 2017 15:10:00 GMT
- RespHeader Server: gunicorn/19.7.1
- RespHeader content-type: application/json; charset=UTF-8
- RespHeader X-Varnish: 132712 98834
- RespHeader Age: 1902
- RespHeader Via: 1.1 varnish-v4
- VCL_call DELIVER
- RespHeader X-Cacheable: YES
- RespUnset Server: gunicorn/19.7.1
- RespUnset Via: 1.1 varnish-v4
- RespUnset X-Varnish: 132712 98834
- VCL_return deliver
- Timestamp Process: 1499701302.309480 0.000067 0.000067
- RespHeader Content-Length: 251799
- Debug "RES_MODE 2"
- RespHeader Connection: close
- RespHeader Accept-Ranges: bytes
- Timestamp Resp: 1499701302.309571 0.000159 0.000092
- Debug "XXX REF 2"
- ReqAcct 198 0 198 197 251799 251996
- End
I realize varnish believes that was treated quickly, but on curl's side it took 5 seconds. Curl is used directly on the varnish server, so it's not network latency. It's a bit hard to reproduce, I'm using a script that does queries in a loop and shows the curl time_total to finally get it to happen.
Could it be something Linux side ? Maybe some kind of limit, or a socket cleanup job or something that would pause the request. It happens maybe once every 400 or 500 requests, sometimes more, sometimes less.