3

The title already sums it up.

I am currently facing this task: A SELECT statement results in 3.6 million records with many columns. This complete pack of data should be sent to the client, so that the browser can display everything in a pivot grid. So pagination is not an option, unfortunately.

The used stack is MariaDB, Node.js v8.11.3, Sequelize v3 and DevExtreme on the client side.

As you can imagine, the node server crashes with this amount of data.

I'd really appreciate any idea how this could be realized or if you have experience with such a task and came to the conclusion that this cannot be done with a web application, yet.

Thank you very much and have a great day!

Patrick

Patrick
  • 31
  • 3
  • Does web client can show 3.6M row without scrolling ? – Ratul Sharker Jan 16 '20 at 14:27
  • @RatulSharker Thanks for your question. Well, that would be the next task. There should be the option of virtual scrolling or something similar, so that the records would be stored in memory, but not shown at once. So the user would definitely have to scroll. But since the user is able to rearrange columns in the pivot grid, all the data has to be available. – Patrick Jan 16 '20 at 14:43
  • What human will survive scrolling through 3.6M rows?? Use the database to summarize the data is some usable way. – Rick James Jan 20 '20 at 05:23

2 Answers2

3

To answer your question: YOU CANNOT.

Providing huge datasets directly to client is generally a bad idea. Imagine that for your example data is 1kb per row, you'll load 3Gb of data. That much data will have to be sent on the network, so either server/client will run out of memory, or if no issue, it will be painfully slow to load and to then use.

There is no straightforward solution here, but you can avoid sending that much data by rethinking how those data exist now to send only aggregation of those data, or by using a client library that uses pagination (pivot table of devexpress doesn't do that).

Just to answer part of the question, you can avoid crashing Node.js by avoiding loading the whole data-set into memory before sending it to the client with techniques like streaming and websocket.

Diego Dupin
  • 1,106
  • 8
  • 9
3

3.6 million rows. Lets assume each row has 128 bytes of data, which is 460,800,000 bytes is 439 mb of raw data you're trying to select, which doesn't sound too bad. But this data will be wrapped in objects/models/turned to json, so your memory requirement turns out at least roughly ten fold. 4.3GB.

Okay, still not too bad. Now we need to push it to browser, style it, wrap it in html, json etc... We're going to push roughly 1.4GB in json to the client. client downloads happily. The json is in the browser. It's getting turned into an object. Memory times 4 roughly. 5.6GB. Not too shabby, but the browser will have copped out because it has a 256MB memory limit per tab(I've run into this when coding a game, will vary per browser).

But let's say it's a custom unbounded browser which can just do it Iterate over the json and make a spreadsheet like display, create all DOM nodes, attach them to the DOM tree, attach event handlers, etc.. Memory times 20: 112GB.

So the customer has a big gaming rig with incredible amounts of RAM in it, a browser that can handle the addressing spaces, and an OS that can handle this. Now you get into the fun territory of paging. The OS needs to page that RAM because too much goes unused and the OS has higher priority tasks to run whilst the user stares at the screen. No microsecond goes unspent. Write to disc, read from disc on every scroll, killing the hard drive of your client.

In short, the browser won't allow it because it has a memory limit. Explain to your customer what he want's requires a custom OS, custom browser, custom computer and still will be slow because of CPU limitations.

Just do what google docs does, load as needed. When the user scrolls, load the needed display data, no more no less, and unload data that's off screen for 5 minutes, to stay under your 256MB limit. When you have a query made up, it's just a simple question of setting an offset and limiting the number of results you want. The rest of the stuff works the same.

The real world has limits, your clients wishes do not. Bring them in balance.

Tschallacka
  • 27,901
  • 14
  • 88
  • 133