1

I am currently trying to save an js object with some binary data and other values. The result should look something like this:

{
  "value":"xyz",
  "file1":"[FileContent]",
  "file2":"[LargeFileContent]"
}

Till now I had no binary data so I saved everything in JSON. With the binary data I am starting to run into problems with large files (>1GB).

I tried this approach: JSON.stringify or how to serialize binary data as base64 encoded JSON? Which worked for smaller files with around 20MB. However if I am using these large files then the result of the FileReader is always an empty string. The result would look like this:

{
   "value":"xyz:,
   "file1":"[FileContent]",
   "file2":""
}

The code that is reading the blobs is pretty similar to the one in the other post:

const readFiles = async (measurements: FormData) => {
    setFiles([]); //This is where the result is beeing stored
    let promises: Array<Promise<string>> = [];
    measurements.forEach((value) => {
      let dataBlob = value as Blob;
      console.log(dataBlob); //Everything is fine here
      promises.push(
        new Promise((resolve, reject) => {
          const reader = new FileReader();
          reader.readAsDataURL(dataBlob);
          reader.onloadend = function () {
            resolve(reader.result as string);
          };
          reader.onerror = function (error) {
            reject(error);
          };
        })
      );
    });
    let result = await Promise.all(promises);
    console.log(result); //large file shows empty
    setFiles(result);
  };

Is there something else I can try?

flonair22
  • 23
  • 5
  • `something else` - you haven't actually shown any code ... just data ... anyway, any errors in your developer tools console that may shed light on what you may be doing wrong? – Jaromanda X Jul 06 '22 at 08:46
  • Why are you trying to save such an object? Who should read it back? When? If on the same browser later on, then IndexedDB is probably the best suited for this. If it needs to work across devices, the easiest might be to write your own binary format, e.g with a simple header stating where your JSON string is, then moving the binary data after that string and push it back in the JS object only after parsing the JSON. – Kaiido Jul 06 '22 at 08:56
  • I have updated the question with some code. @Kaiido What I am trying to do is to analyse these files that are beeing uploaded. For the analysis it is possible to provide a configuration. I wanted to save the whole progress in to a file so an analyst could save the progress and continue later on. The file containing everything should also be sent to a server that is doing the actual analysis on the data. Till now the client and the server have been on the same computer so instead of saving the file I was just able to save an absolute path. I want to change this now. – flonair22 Jul 06 '22 at 09:04
  • @JaromandaX Sadly there is no error message whatsoever. I belive the sole problem is that the file is too big. – flonair22 Jul 06 '22 at 10:22
  • what is "too big" - what size does this fail with – Jaromanda X Jul 06 '22 at 10:28
  • so, if you `console.log(error)` as well as `reject(error);` ... still no errors in the console/ – Jaromanda X Jul 06 '22 at 10:31

1 Answers1

2

Since you have to share the data with other computers, you will have to generate your own binary format.

Obviously you can make it as you wish, but given your simple case of just storing Blob objects with a JSON string, we can come up with a very simple schema where we first store some metadata about the Blobs we store, and then the JSON string where we replaced each Blob with an UUID.

This works because the limitation you hit is actually on the max length a string can be, and we can .slice() our binary file to read only part of it. Since we never read the binary data as string we're fine, the JSON will only hold a UUID in places where we had Blobs and it shouldn't grow too much.

Here is one such implementation I made quickly as a proof of concept:

/*
 * Stores JSON data along with Blob objects in a binary file.
 * Schema:
 *   4 first bytes = # of blobs stored in the file
 *   next 4 * # of blobs = size of each Blob
 *   remaining = JSON string
 *
 */
const hopefully_unique_id = "_blob_"; // <-- change that
function generateBinary(JSObject) {
  let blobIndex = 0;
  const blobsMap = new Map();
  const JSONString = JSON.stringify(JSObject, (key, value) => {
    if (value instanceof Blob) {
      if (blobsMap.has(value)) {
        return blobsMap.get(value);
      }
      blobsMap.set(value, hopefully_unique_id + (blobIndex++));
      return hopefully_unique_id + blobIndex;
    }
    return value;
  });
  const blobsArr = [...blobsMap.keys()];
  const data = [
    new Uint32Array([blobsArr.length]),
    ...blobsArr.map((blob) => new Uint32Array([blob.size])),
    ...blobsArr,
    JSONString
  ];
  return new Blob(data);
}

async function readBinary(bin) {
  const numberOfBlobs = new Uint32Array(await bin.slice(0, 4).arrayBuffer())[0];
  let cursor = 4 * (numberOfBlobs + 1);
  const blobSizes = new Uint32Array(await bin.slice(4, cursor).arrayBuffer())
  const blobs = [];
  for (let i = 0; i < numberOfBlobs; i++) {
    const blobSize = blobSizes[i];
    blobs.push(bin.slice(cursor, cursor += blobSize));
  }
  const pattern = new RegExp(`^${hopefully_unique_id}\\d+$`);
  const JSObject = JSON.parse(
    await bin.slice(cursor).text(),
    (key, value) => {
      if (typeof value !== "string" || !pattern.test(value)) {
        return value;
      }
      const index = +value.replace(hopefully_unique_id, "") - 1;
      return blobs[index];
    }
  );
  return JSObject;
}
// demo usage
(async () => {
  const obj = {
    foo: "bar",
    file1: new Blob(["Let's pretend I'm actually binary data"]),
    // This one is 512MiB, which is bigger than the max string size in Chrome
    // i.e it can't be stored in a JSON string in Chrome
    file2: new Blob([Uint8Array.from({ length: 512*1024*1024 }, () => 255)]),
  };
  const bin = generateBinary(obj);
  console.log("as binary", bin);
  const back = await readBinary(bin);
  console.log({back});
  console.log("file1 read as text:", await back.file1.text());
})().catch(console.error);
Kaiido
  • 123,334
  • 13
  • 219
  • 285