When is horizontal scaling likely to solve your scaling problems?
Let's say you have single api node (no DB) and a desired goal of 10k RPS over 5 minutes where the p95 is < x ms. Requests are coming in and you start to see that p95 go above your x goal. If you don't see any clear metrics indicating poor application performance (>75% CPU, > 75% RAM, etc), is it safe to assume horizontal scaling is likely the solution?
At first I thought the answer was "yes", but then I saw this article. Vertically scaling a node application from a large to a xlarge AWS instance allowed it to go from 10k RPS to 25K RPS. How is that possible? CPU Utilization on the 10k test was around 10% (not that high). It's possible its memory but seems unlikely. Am I missing something? Or is horizontal scaling just cheaper than vertical scaling with the additional benefit of resiliency?