To give you an idea of the data:
DB has a collections/tables that has over a hundred million documents/records each containing more than 100 attributes/columns. The data size is expected to grow by hundred times soon.
Operations on the data:
There are mainly the following types of operations on the data:
- Validating the data and then importing the data into the DB, that happens multiple times daily
- Aggregations on this imported data
- Searches/ finds
- Updates
- Deletes
Tools/softwares used:
- MongoDB for database: PSS architecture based replicaset, indexes (most of the queries are INDEX scans)
- NodeJS using Koa.js
Problems:
HOWEVER, the tool is very badly slow when it comes to aggregations, finds, etc.
What have I implemented for performance so far?:
- DB Indexing
- Caching
- Pre-aggregations (using MongoDB aggregate to aggregate the data before hand and store it in different collections during importing to avoid aggregations at runtime)
- Increased RAM and CPU cores on the DB server
- Separate server for NodeJS server and Front-end build
- PM2 to manage NodeJS server application and for spawning clusters
However from my experience, even after implementing all the above, the application is not performant enough. I feel that the reason for this is that the data is pretty huge. I am not aware of how Big Data applications are managed to deliver high performance. Please advise.
Also, is the selection of technology not suitable or will changing the technology/tools help? If yes, what is advised under such scenarios?
I'm requesting your advise to help me improve the performance of the application.