How to improve application performance? [Updated]

Question

To give you an idea of the data:
DB has a collections/tables that has over a hundred million documents/records each containing more than 100 attributes/columns. The data size is expected to grow by hundred times soon.

Operations on the data:
There are mainly the following types of operations on the data:

Validating the data and then importing the data into the DB, that happens multiple times daily
Aggregations on this imported data
Searches/ finds
Updates
Deletes

Tools/softwares used:

MongoDB for database: PSS architecture based replicaset, indexes (most of the queries are INDEX scans)
NodeJS using Koa.js

Problems:
HOWEVER, the tool is very badly slow when it comes to aggregations, finds, etc.

What have I implemented for performance so far?:

DB Indexing
Caching
Pre-aggregations (using MongoDB aggregate to aggregate the data before hand and store it in different collections during importing to avoid aggregations at runtime)
Increased RAM and CPU cores on the DB server
Separate server for NodeJS server and Front-end build
PM2 to manage NodeJS server application and for spawning clusters

However from my experience, even after implementing all the above, the application is not performant enough. I feel that the reason for this is that the data is pretty huge. I am not aware of how Big Data applications are managed to deliver high performance. Please advise.

Also, is the selection of technology not suitable or will changing the technology/tools help? If yes, what is advised under such scenarios?

I'm requesting your advise to help me improve the performance of the application.

Just a suggestion, but it feels like you are using the data for search, why not use elastic search? — NItin Vaja, Sep 16 '21 at 14:37
No, it is not mainly for search. Consider this something like analytics on data. Hence most of the work is aggregations based on certain filters at run time. — Temp O'rary, Sep 20 '21 at 05:38
However to avoid runtime aggregations we have pre-aggregated the data and stored it in multiple collections based on the type of aggregations. And when the user makes a request we simply filter the data from the pre-aggregated data and in some cases run some last level aggregations on it. — Temp O'rary, Sep 20 '21 at 06:05
What kind of caching did you try? Did you cache queries themselves? Or cache on somewhere else? — huseyin tugrul buyukisik, Sep 29 '21 at 18:56

Sebastian Hildebrandt · Answer 1 · 2021-10-08T13:56:02.967

Not easy to give a correct answer because we do not really have that much details. What I would do is a detailed monitoring, at least the following:

Machine Level:

monitor the overall CPU load (for all cores) and RAM usage on your DB machine
monitor disk IO on the disks where the data is stored
this should show, if the machine specs are a bottleneck

Database & DB Process Level (my first guess, that this is the critical part):

what is the overall size of your data at the moment (I know, it will increase drastically but if it is already to slow now, this could be an interesting information - especially in relation to the current RAM size and number of CPU cores)
monitor memory usage and CPU load for your mongo DB process...
did a look on the query plans (while doing aggregations) guided you, what improvements can be done?
have look at the caching strategy. What strategy are you using?
this should give more detailed results on where to make improvements on a DB level. Is it just because of hardware bottlenecks or is it a aggregation problem...

Node.JS APP Level:

node.js app: how much RAM and CPU usage does this one take ...?
if there are multiple instances of the node.js app, track this for all instances
is the data import also happens through the nodejs app. Does the load on the app increases drastically while importing data?
if you see that you have a high load on this app that there is a need to act here (increasing instances, splitting it into seperate apps (e.g. import as a seperate app)

How to improve application performance? [Updated]

1 Answers1