1

I have a table that I am trying to read in Javascript, with data that is large enough that I would like to have it in binary format to save space. Most of the table is either numbers or enums, but there is some data that is strings. I'm trying to figure out the best way to read in that data in Javascript. Any suggestions?

FYI, the zipped up text version is ~33 MB, while the zipped up binary fine is ~20 MB, which is why I am trying to reduce the size, hopefully allowing it to load faster. I'm planning on doing some preprocessing of the data with Python, which will reduce the data to just that which I actually care about. I may be doing some filtering of this data, but ideally I would ultimately like a static website, so I would strongly prefer to not have to rely on server side code of any kind. The file that has almost anything of note is only ~1 MB, so not a huge concern I hope.

Bonus points if whatever system allows for a 16 bit float, as there are a number of floats that 16 bits is more than enough precision to capture adequately.

My code, simple as it is thus far, is as follows. Note the missing block where anything useful happens...

function loadBinaryFloatArray(url) {
    
    var mRequest = new XMLHttpRequest();
    mRequest.open('GET', url);
    mRequest.responseType = 'arraybuffer';

    mRequest.onreadystatechange = function () {
        if (this.readyState === 4) {


        }
    };
    mRequest.send();
}

console.log("Test")

loadBinaryFloatArray("/data/sbdb_important.bin")

The data looks something like this. Note I will probably be removing some of these columns, but this is the source data. Of some reference, only the first column actually will be stored as a string, the others can be stored as numbers of some kind or enumerations converted to a number. There will probably be two actual strings, I think I'm going to divide up the one string in to its two pieces.

enter image description here

For reference, the Python code to pack is as follows. I can change the es to something else if required, either float32s or an integer that converts to a decimal, but it would be nice to natively support.

struct.pack('ehifeefff',diameter,type,epoch,a,e,i,om,w,ma)
PearsonArtPhoto
  • 38,970
  • 17
  • 111
  • 142

1 Answers1

1

These kinds of problems usually need bespoke solutions and there's a lot of tuning involved. But, generally, I think it could go like this:

Split the data into column types. Retrieve the data via separate, by type, but simultaneous requests. The string columns would be just a JSON array of arrays of strings, http compression will take care of minimizing the download size. Floats and ints can be sent as binary arrays of the appropriate type (see Javascript typed arrays). Enums could be strings (http compression could make this the faster download) or ints-plus-mapping, testing would tell which is generally faster.

In the browser, recombine the data. The table seems large so you'll likely want a table paging library to handle presentation. Otherwise, use efficient strategies for inserting large numbers of elements. This is usually through creating a large array of text chunks representing the final html, .join('')ing it, then injecting it all at once into the document.

Ouroborus
  • 16,237
  • 4
  • 39
  • 62
  • 1
    Separating them in to separate files might just be what is required, sadly. This might just be the way to manage this, will wait to see if I hear anything better before going forward. Thanks! – PearsonArtPhoto Dec 06 '21 at 01:50