1

A node.js app using cluster deployed on google app engine flexible does scale but the traffic always get sent to one instance.

The app uses the cluster module to takes advantage of all the cpus, and the only way specify a scaling measure is to use the cpu_utilisation, so we did that.

The scaling works fine, once it reaches the target_utilization it spawns another instance.

But the issue, is that the same load test takes exactly the same time no matter how many instances are up. This could only mean that the traffic is not being split between all instances.

So I'm wondering if the traffic is not always going to the same instance, any way to prove or improve?

Edit:

The load test is just a regular load test, the first one was 20 threads with a ramp-up of 5 seconds in a loop 4 times.

Edit 2:

Update: It appears to be scaling correctly now. Not sure my code base changed much, but the routing is properly done too. A google cloud platform update may have fixed my issue?

PCS-I
  • 403
  • 4
  • 10

2 Answers2

0

If you are starting the test while there is only the minimum number of instances and while testing it scales up, the time won't change between tests because what scaling is doing is allowing the application to match the amount of requests with the resources needed to serve them avoiding 5XX errors.

If your test is making spawn instances then those instances are working because if not, they get killed as they are not needed. What scaling is doing is not speeding up the serving time, it is matching the amount of resources needed for serving X amount of requests at any given point so the application always can process requests at same speed.

Buckors
  • 40
  • 7
  • Thank you for the reply, as I said, the scaling works fine, as you said, with the number of request increasing so does the number of instances: scaling ok. Now the issue is, with more instances already running, it still takes the same amount of time for the load test to finish. So if I have x instances or x + 5 instances while running the test, the time is the same, the only possibility I see is that it always route the traffic to the same instance. – PCS-I Mar 21 '19 at 15:22
  • Provide me with more information about the test please so I can try to further help you – Buckors Mar 21 '19 at 15:26
  • I trimmed down the initial test to the most basic stress test. 5 Threads at the same time making a post request to the app running on cloud. To put a little context, it is a node.js app generating all kind of reports. And I said, having more instances or only 1, does not change the time of the test, what kind of information would help more the investigation? – PCS-I Mar 21 '19 at 21:56
  • How many requests are those 5 threads making? Could it be that the test is not "stressful" enough? Could you try stressing the app using Apache benchmark and forward me the command you used for it? This is for Apache benchmark https://www.tutorialspoint.com/apache_bench/apache_bench_quick_guide.htm – Buckors Mar 26 '19 at 16:52
  • Will do, give me some time, I will provide you with detailed updates – PCS-I Mar 26 '19 at 17:32
  • Update: It appears to be scaling correctly now. Not sure my codebase changed much, but the routing is properly done too. An update may have fixed my issue? – PCS-I Jun 20 '19 at 18:37
0

Buckors explanation on scaling is correct, however, you should still see improvement on your load tests. A quick way to check that your instances are actually getting a distributed load is to check the Stackdriver logs.

You can go to

Stackdriver -> GAE Application - [Service Name]

And you can then see the list of requests your service is handling/handled. Then you can filter it out by individual App Engine Flex instance IDs located on the Cloud Console under App Engine -> Instances and you will see an ID like "aef-Service Name-alpha numeric"

If your requests are not being routed properly, I encourage you to post your issue in a Private Issue where Google Cloud engineering should be able to look into your project.

ZUKINI
  • 195
  • 2
  • 15
  • Update: It appears to be scaling correctly now. Not sure my codebase changed much, but the routing is properly done too. An update may have fixed my issue? – PCS-I Jun 20 '19 at 18:37