0

I have a social app where users do the usual stuff of other social apps; upload multiple text and photo posts every hour, like & comment (a notification is created for each action), view custom/native ads, block users (and their content) etc.

The app runs on a Parse Server of version 2.8.4. For those who don't know, Parse Server uses Node.js, express and MongoDB. I have 1 server for the app and another one for the db, both hosted on DigitalOcean.

Here are their specs:

Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz x4
8 GB of RAM
SSD
Ubuntu 16.04

Normally, we have about 100-150 simultaneous users every day that create about 500 posts a day, 2000 comments, 2000 likes and each of them usually stays in for about 40 minutes. but today we reached 600 and the app literally froze! I saw the charts provided by DigitalOcean and every metric (cpu, ram etc) was normal, 40-50 percent max. The inbound & outbound bandwidth on the other hand peaked!

As you can see in the image below, every day we hit about 6 Mbps of inbound and 2.5 Mbps of outbound. Today, we hit over 10 Mbps of inbound and 15 Mbps of outbound!

enter image description here

The app runs with pm2 on a single CPU. During the incident, we tried using all 4 of them but nothing seemed to improve... it still froze. We also don't cache anything at the moment (we will though real soon). The db is indexed, but other than that not much have been done for improvement. All photos are stored in an S3 of DigitalOcean.

The question is, considering that every other metric was of normal highs, and that the db is fairly-well structured, do you think that this bandwidth spike could cause a total freeze on the server, or it wouldn't affect that at all? Could it be that the server we're using isn’t good enough to support the app?

Also, how many users do you think our infrastructure should support? I know it depends on many factors, but based on what I described is it normal not to be able to handle 600 users?

Sotiris Kaniras
  • 198
  • 2
  • 10

1 Answers1

0

This looks indeed like you hit a resource limit of your server or database.

do you think that this bandwidth spike could cause a total freeze on the server

You would start by finding out what the "total freeze" really was by looking into the logs and metrics. There is essential information missing in your post to reach a sound conclusion, namely MongoDB metrics and instance metrics at the time of the freeze. If you haven't set up MongoDB monitoring, this may be a good time to do so.

For MongoDB, often experienced constraints are:

  • CPU usage maxed out (especially if you make extensive use of aggregation)
  • IOPS credits for the underlying node disks used up (which usually indicates RAM shortage for the working set)

For a server instance, it could be:

  • CPU usage maxed out
  • RAM usage maxed out
  • network limits

Depending on your infrastructure it may also be something more obscure like limits of your reverse proxy or a load balancer. In addition, in a virtualized environment with Digital Ocean there may be other "artificial" limits that come into play.

how many users do you think our infrastructure should support?

It depends on the resources your app needs as traffic increases and the limits regarding server instance, database instance and network in between both and to the public. Every app is different, so there is no generic answer to that.

Generally speaking, you would continuously adapt the resources to predicted traffic based on historic data, considering such spikes. For user facing services it usually makes sense economically and operationally to set up a load balancer and auto-scaling group to handle traffic fluctuations. You would also be better prepared for malicious attacks on your infrastructure that can cause load increases, which will become more likely and increase in frequency as your service gets more popular.

Manuel
  • 225
  • 3
  • 13