Scalling up my single instance node.js/mongodb application on Amazon EC2 - starting from scratch

Question

I am a happy Amazon EC2 user, I followed some tutorials here and there (mostly this one) and I successfuly deployed a node.js app.

I currently have the t2.nano machine, which works fine as my test environment for couple users. Currently my iOS app is almost ready, so before the big release, I need to take care about the scallability and prepare it for a wider group of users.

I optimisticly assume that I will have a 1000 users soon. I want to prepare the whole environment to work smooth for at least that number of users. But the problem is - I have no idea how should I even start configuring everything.

Correct me if I'm wrong, but I assume I need 2-3 machines t2.medium (?) all of which run my node.js code and are handled by Amazon's Load Balancer, but what about the database?

If I get things right, I need to set up a mongodb master instance on one machine (one of those t2.medium mentioned above), and slave instances on the other two machines?

But - if I do so - what about replicating the data between the machines? Also - each node.js server running on each t2.medium machine has to have a connection to a database - should it point to the master one?

I tried to find any tutorial - similar to the one I used to deploy a single app - but I'm struggiling with it. I found this youtube video where a guy describes how he set up the environment with 5 mongodb machines (one master and the rest were slaves), but I'm not sure whether it's a good direction here.

Could you guys please help me and give me any hint at this point? I don't know when to even start, I will really appreciate anything at this point. Thanks!

Prove nano is not enough. Load-test your app with http://locust.io/ or similar. Check bottlenecks, and scale weakest component. — Alex Blex, Mar 28 '17 at 08:24
Yes,you're right, `nano` is for sure not enough. At the moment I need to scale the environment, so I would like to know how could I do it - do I need to set up e.g. 3 `medium` machines behind a single load balancer? and if yes - how can I connect them to my database? I thought about setting up 3 `medium` machines with node.js and 3 `medium` machines with database - one with master and two slaves, but then how can I connect those pieces into one working environment? Do I have to point each node.js server to my master database and hope that - in case of overload - slave takes over automatically? — randomuser1, Mar 28 '17 at 09:17
starting with nano you have a luxury of both horizontal and vertical scalability. Which one to use, and what component to scale depends on results of load-testing and cost comparison. Typical setup uses primary to read/write, and secondaries for redundancy, yet there are usecases to connect to secondaries directly. When you connect to replica set, it's job of the driver to find out which node is primary, and switch to the next primary as soon as it is elected. you can relay on it. Saying that, replica set makes your database slower, it has nothing to do with scale. — Alex Blex, Mar 28 '17 at 10:01

score 1 · Answer 1 · answered Mar 28 '17 at 15:53

I will try to answer this question step by step. Remember though, this is just one possible setup among many, and may not fit your needs entirely.

Topology:

AWS
Node.js
MongoDB

AWS - You mentioned you are now striving for 1000 users. All you said about your app is that is IOS, so we have no idea how intensive the backend DB must be (CRUD operations).

I would start off with an eye on scalability if and when you need it. Therefore I would strongly suggest what Amazon has termed a NAT Gateway. This will allow your Node.js to sit on the server facing the Internet, while the MongoDB will be behind it. This first and foremost protects the Mongo from any unwanted access. In other words an initial setup would be 1 Gateway (which would hold Node.js) and for the time being let us call it, the Master MongoDB behind the Gateway. Access to this server is only through an SSH tunnel from your Gateway. It is on a CIDR address range.

Setting up a NAT Gateway correctly though, is not simple 1-2-3. You really do need to understand how Amazon uses routes and of course how to make proper use of inbound and outbound rules.

The actual Gateway server (Node.js server), should be on an ElasticIP. This will save you a lot of heartache when you need to expand up to a better server. You should also take snapshots or actual images of your server any time you make a critical change to it.

As to the Gateway server itself. Depending on your Node.js actual code, you should at the very least go for a t2 medium, (micro simply will not do). As for protection, again inbound and outbound rules, and possibly put a software firewall up on the server.

I myself, also use a few other utilities, but one which I think would be critical here is PM2. This will keep your Node code running in the event of restarts and make life easier as you expand into more cores on the server(s).

You did not mention your choice of OS so I will not touch that heated topic except to say, Ubuntu and AWS Linux are sane choices.

As to your MongoDB which is now protected behind your Gateway. Here is where I may be a bit conservative, just basically because I am dealing with the realities of numbers you asked about. I personally see no reason for 1000 users, to set up a cluster or shards on a MongoDB until you see if you app really does take off. The NAT gateway is scalable, and if your app takes off, then you will be moving to ATLAS or Enterprise anyways. However, is you want to cluster (master-slaves) you can choose either 3 or 7 (I think 7 may be 5). One is your master. You ssh tunnel into anyone the same way you ssh into your master (obviously over a different CIDR). If you do cluster, you have to really pay close attention to the mongod.conf file and all the parameters available to you.

Obviously there is a great deal more to write about, and of course many have differing opinions, which is a good thing. However, I would err here on the side of caution (and your monthly bill!) until you have the basic setup you need working. Then you can scale your Mongo into clusters and sharding and your Node.js app as well.

Just to sum up:

Inbound and Outbound rules & Routes in AWS
OS you will use
Actual server and core configuration
Node.js on the Gateway server facing the world
MongoDB server (or cluster) behind the Gateway.
Critical: You should almost immediately create roles in your MongoDB. This too is the last line of defense. Do not ever have an instance of MongoDB without at least an Admin role set up, so you will have control over what rights are required to access or change your data.

Good luck - hope it all works even better than you dreamed!

first of all, thank you very much for the patience to write all of it, I've read it once, I've read it twice, but definitely not the last one though. I see it's a longer path, but also quite interesting one. As soon as I understand fully all those words from your answer, I will analyze it and see how can I proceed. In the meantime though, I have one more question - I've read some other questions and answers and I thought about setting up - just for the beginning - a setup like this: 1 `load ballancer` that points to two `t2.medium` machines with `node.js` code, and one `t2.medium`... — randomuser1, Mar 28 '17 at 17:30
that stores the `mongo` database. From then, if the app fires up and gets popular (:D), I would consider adding more machines to load balancer and possibly upgrading the one with `mongo` too. Do you think it might be a healthy environment for the beginning? I'm asking about it because I feel I could set up environment like this within couple hours, instead of spending days at the moment digging through more complex solutions. What do you think? — randomuser1, Mar 28 '17 at 17:33
@randomuser1 Take a look at the [NAT Gateway](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-nat-gateway.html) mentioned above in my post. Two t2's would be fine as long as you provision them in setup with enough disk space. Keep in mind, MongoDB **absolutely loves memory** and you really need to set up the params in the mongod.conf correctly if you are doing heavy CRUD. Once you get the hang of it, you can think about real scaling, _if and when needed_. Don't go overboard, as you do have that monthly bill to think about. :) — twg, Mar 28 '17 at 18:16

score 0 · Answer 2 · answered Mar 28 '17 at 08:24

0

This answer is restricted to the question of your database servers:

For the databases, you certainly should configure a replica set; but for the sake of redundancy and availability, not to support a higher load. There are some good instructions on how to deploy in the MongoDB docs.

If you want to be sure the databases will support a higher load, then you need to make sure that the hardware (CPU, RAM, disk IO) are up to the job; it's hard to know in advance exactly what level of hardware provision you will need, so I recommend doing some load testing to find out how a given set of hardware can respond.

answered Mar 28 '17 at 08:24

Vince Bowdren

8,326
3
31
56

Thanks Vince, I will take a look into it, right now when I checked your second link, it seems like the exact same thing that the guy is doing in the youtube tutorial that I mentioned in my original question. At the moment I don't know how to handle it together with my node.js setup though... I'm not sure if I should split the code into separate machines with load balancer at the end, and if so, how to connect it with my data base replica set – randomuser1 Mar 28 '17 at 08:29

Scalling up my single instance node.js/mongodb application on Amazon EC2 - starting from scratch

2 Answers2