A node.js app using cluster deployed on google app engine flexible does scale but the traffic always get sent to one instance.
The app uses the cluster module to takes advantage of all the cpus, and the only way specify a scaling measure is to use the cpu_utilisation, so we did that.
The scaling works fine, once it reaches the target_utilization it spawns another instance.
But the issue, is that the same load test takes exactly the same time no matter how many instances are up. This could only mean that the traffic is not being split between all instances.
So I'm wondering if the traffic is not always going to the same instance, any way to prove or improve?
Edit:
The load test is just a regular load test, the first one was 20 threads with a ramp-up of 5 seconds in a loop 4 times.
Edit 2:
Update: It appears to be scaling correctly now. Not sure my code base changed much, but the routing is properly done too. A google cloud platform update may have fixed my issue?