What kind of "streamlining" occurs when you use a single machine to generate load/stress?

Question

The title of this question represents my main concern, but if you read on beyond the question section, you'll find some background about our set up.. which may or may not be relevant / useful.

Question

We're stress testing our application using Gatling, and are running the Gatling scenario on a single machine. We're finding that our application is able to cope with a high load as generated by the stress tool; however, it is not able to cope with relatively low load from real users.

My question is: what kind of OS/network level optimisation or streamlining occurs when concurrent requests are made from a single machine/OS to an application, vs concurrent requests from multiple machines (ie. regular users using their web browsers)?

Background

We have a Tomcat application sitting behind Apache via AJP, which is itself behind a Citrix Netscaler via port 80 (we're also planning on taking Apache out of the equation, but that's another matter..).

Our app has been grinding to a halt under relatively low load (CLOSE_WAIT connections building up between apache and tomcat), and we're in the process of load testing it to resolve the problem. Deadlocks, occurring in our SQLServer instance, were showing up quite frequently and so we decided to start there. In order to replicate the problem and subsequently test our fixes, we're using a single machine to generate load using Gatling.

When we first started, we were able to reliably replicate the deadlocks by using the tool. After we made some optimisations the deadlocks went away, and so did the CLOSE_WAIT connections. We then pushed the application to a load we were very happy with, and it ran without any major hiccups.

Unfortunately, when the fixes were applied to the production system, we were still seeing the same original behaviour. Which brings me to wonder if the load generated by the stress tool isn't a good representation of what's actually going on in the real world, due to it originating from a single source, rather than many different clients spread across the internet.

score 1 · Accepted Answer · answered Jan 21 '17 at 03:03

1

A single load generator will likely do a better job of connection pooling than disparate clients; better use of Keepalives for example. This makes for more requests over less connections.

If round-robin DNS is involved it will tend to hit just one of the DNS destinations rather than spread the load across all of them. Some load balancers make stickiness decisions based on client IP, which would be static in this case.

Your load generator may have a constrained execution pool (say, 200 'users') so that latency in response causes the users to slow down, as opposed to the real world where you have a much larger # of users who do not patiently wait for other users to finish.

answered Jan 21 '17 at 03:03

Jason Martin

5,023
17
24

Thanks for your answer. Are there any http headers that could be set by the load tool that might help with this? By the way, this is falling over at a very low load (~50-100 concurrent users), and the load test has only pushed up to 500 concurrent users. – jlb Jan 21 '17 at 10:44
You might be able to disable keepalives in your load tool to better simulate the transient nature of real users. Separately, make sure connection pooling is enabled on your load balancer. – Jason Martin Jan 21 '17 at 17:53

score 1 · Answer 2 · answered Jan 23 '17 at 09:40

It is hard to stay anything without seeing your Gatling test scenario. Just a "blind shot": your Gatling test doesn't accurately represent real user, i.e.

Real browsers download external resources embedded into the page, i.e. images, scripts and styles and do this using concurrent thread pool. If your Gatling test is missing inferHtmlResources methods it might be the case that the load, coming from Gatling is much less than conducted by real users sitting behind the real browsers
DNS caching. Gatling may hit only one IP address due to IP addresses behing the DNS names caching on JVM level. As per Gatling FAQ:

Basically, Gatling/JVM’s DNS cache has to tuned. A solution is to add -Dsun.net.inetaddr.ttl=0 to the command line.
AJAX requests. Gatling doesn't execute client-side JavaScript therefore if your application is built on XMLHTTP Requests they won't be fired when Gatling hits the page. You will need to handle them manually in case your application is using some form of AJAX

So I would recommend referring How To Make JMeter Behave More Like A Real Browser and implement equivalent Gatling setup as if load test doesn't represent real life load it doesn't make a lot of sense to run such a test.

Thanks Dmitri. I've got `inferHtmlResources` enabled on my scenario, which also covers all of the AJAX requests that are made along the user journey path that that I've decided to test. I haven't done anything to circumvent DNS caching - my reason is that I'm looking to test everything beyond DNS resolution, though I'll try dropping the ttl to 0 as you suggested. — jlb, Jan 23 '17 at 10:26

What kind of "streamlining" occurs when you use a single machine to generate load/stress?

Question

Background

2 Answers2