I'm currently working on a Node.js stack application used by over 25000 people, we're using Sails.js framework in particular and we got MongoDB Application is running at a EC2 instance with 30GB of RAM, databse is running on a Mongolab AWS based cluster in same zone the EC2 is. We even got an Elastic Cache Redis instance with 1.5GB for storage.
So the main and huge problem we're facing is LATENCY. When we reach a peak of concurrent users requesting application we're getting multiple timeouts and sails application reaching over 7.5GB of RAM, HTTP requests to API take longer than 15 seconds (which is unacceptable) and when even get 502 and 504 responses sent by nginx.
I can notice Mongo write operations as our main latency issue, however even GET requests take long when a demand peak is present. I can't access production servers, I only got a keymetrics monitoring tool by pm2 (which is actually great) and New Relic alerts.
So, I'd like to know some roadmap to cope these issues, maybe more detailed information should be offered, so far I can say application seems stable when not much users are present.
What are main factors and setup to consider?
So far I know what I should do, but I'm not sure about details or the hows.
IMHO:
- Cache as much as possible.
- Delay MongoDB write operations.
- Separate Mongo databases with higher write demand.
- Virtualize?
- Tune up node setups.
On optimising code, I've posted another stackoverflow question with one example of code patterns I'm following.
What are your advise and opinion for production applications?