I am working to create a report engine for multiple application, from multiple database in NodeJS. The problem is, I can't think of any way to merge 2 big data sets into one data without waiting for both streams to finish..
For example, this is my stream.. let's say.. stream 1
const request = new mssql.Request();
request.stream = true;
request.query('SELECT id, transaction, delivery FROM LargeTable');
request.on('row', row => {
processStream(row);
}
then this is my second stream..
const connection = mysql.createConnection(...);
connection.query('SELECT id, transaction, supplydrop FROM AnotherLargeTable')
.stream()
.pipe(processStream);
Those only 2 example table. The real datasets will be determined by the user. We also can't assume that they are going to use mssql and mysql only. There could be another datasets there that the user want to merge, such as from files, or CSVs, or consuming data from web. Both are running at different speed and bandwidth.
Now, if user wants to aggregate the data and filter them, I can't possibly make something like:
select id, transaction, delivery, supplydrop FROM LargeTable lt1
inner join AnotherLargeTable lt2 on lt1.transaction = lt2.transaction
where delivery = 'Customer1' and supplydrop = 'bad supplier'
The problem is, that I had to wait until those 2 streams ends, and joins both data, and that would consume a large chunk of memory, and has a potential to crash the system. While I could do some kind of filtering on both data sources, but joining can only be done after both streams are complete.
Is there any way to work with both datasets, without exhausting too much of memory, or is there some kind like small memory based database engine that I can pipe both streams into, and join them? Is implementing a in-memory database solution an overkill? Or is there something else that fit this particular job nicely?
Thank you for your time.