I work in a company that is entirely on the cloud and we have started a stress testing project. The idea is to load everything in production to a new environment and run stress tests on it to find the total capacity of the system and where the bottlenecks are.
Now, I remember a time when we stress tests physical servers as well as private clouds and I remember that it was almost impossible to get a complete copy of production and all its moving parts. Also, even with stress testing tools like sysbench, Jmeter and ab, you never could exactly simulate traffic just like production.
We would usually monitor and profile production as much as we could, identify an issue and then try to fix that specific issue by simulating it in the stress testing environment.
To calculate capacity we used to (and some still do) use a calculation to predict when capacity will be met or if the response time is below satisfactory levels.
Considering that the project to recreate production and stress test it is quite time and resource consuming, is this the best way to go to find bottlenecks in the system and measure capacity, or is the "old" way better?