6

I just took over administration of a new environment. A known issue is that the environment is known for high response times (20+ seconds), so I figured I'd turn on haproxy logging and see what is going on. I figured I'd see slow load times in the app servers, but I'm actually seeing high Tq values in HAProxy. The HAProxy is on EC2 and is NOT behind ELB.

Sep  5 14:22:00 haproxy-apps01 haproxy[24695]: 76.14.153.221:3371 [05/Sep/2012:14:21:49.780] http-in default_apps/fe04-c 10936/0/0/55/10991 200 488 - - ---- 111/111/0/1/0 0/0 "GET /event_times/next?callback=jQuery170189312373075111_1346854917562&_=1346854918453 HTTP/1.1"

As you can see, this one has a Tq of about 10 seconds. Not all the Tq's are high (1+ seconds), but a good percentage of them are (approx 35%). Normally when I see this behavior, I'd expect there to be network issues, but this is an incredibly high percentage of visitors to be having an issue like this, so I'm wondering if anybody has seen this or have any hints on diagnosing if the issue could possibly be on this box?

Will
  • 826
  • 2
  • 9
  • 19

2 Answers2

4

High Tq times are not always indicative of a problem, is http-server-close set?

As per the documentation:

Setting "option http-server-close" may display larger request times since "Tq" also measures the time spent waiting for additional requests.

For example, this is set for Stack Overflow, and as some example data, the follow query:

Select Top 20 Tq from LogsLastTwoDays WITH (NoLOCK) WHERE
CreationDate > DATEADD(minute, -5, GETUTCDATE()) AND 
ResponseCode = 200 AND Host = 'stackoverflow.com'
ORDER by Tq DESC

Yields:

Tq    
----- 
14990 
14987 
14986 
14983 
14974 
14972 
14972 
14965 
14964 
14964 
14962 
14961 
14960 
14955 
14952 
14951 
14945 
14943 
14935 
14932 

This is because with http-server-close, the connection is kept open to the client (a persistent connection) for up to timeout http-keep-alive 15s in our environment.

I usually focus on Tr first, because that shows the response time between the LB and the server.

Kyle Brandt
  • 83,619
  • 74
  • 305
  • 448
  • Sorry, I should have specified. No, http-server-close is not set – Will Sep 05 '12 at 14:54
  • @Will: Okay, will leave this incase someone else has the same question in the future. My guess is a network / resource problem then. Have you checked dmesg and all the vitals? No issues with conntrack table filling etc? – Kyle Brandt Sep 05 '12 at 15:08
  • As a side note, what are you using to query the logs against? – Will Sep 05 '12 at 18:00
  • 1
    @Will: We wrote a service that takes syslog data via UDP, parses it with a regex, and inserts it into MS SQL Server. Each day gets its own table. – Kyle Brandt Sep 05 '12 at 18:34
  • Did you all post the source code anywhere? Would love to modify it to send data elsewhere. – Will Sep 06 '12 at 14:29
4

Pasting your configuration somewhere could help.

This could be related to timeout configuration and to HTTP mode as well. As Kyle mentionned I would have vote for the http-server-close option first. Since you have not enable it, you might be be under a slow-loris like attack.

Could you setup a "timeout http-request" parameter to 5s and see if you still have those line or if they are replaced by a 408 response ?

cheers