0

I have a JSON file, about 800MB and no way I can load that into memory, even with FileReader.readAsText (the result property is an empty string). I don't think this is relevant but the JSON file is an array of about 3.5 millions small objects. Note that the file is picked by user in browser and never leaves browser. All processing is in browser.

I tried Oboe.js and could stream the input in but it stops after a while. Looking at the event, I guess Oboe is storing all the JSON objects it parse. Their browser version doesn't support stream as well if I understand correctly.

Is there anyway to forward reading JSON? I don't mind not having the previous state, similar to .NET Utf8JsonReader?


Here's my current attempt with Oboe:

    async loadAsync(file: Blob) {
        const blocks: any[] = this.blocks = [];
        
        await new Promise<void>(r => {
            const ms = Date.now();
            const url = URL.createObjectURL(file);
            let counter = 0;

            oboe(url)
                .node("!.[*]", (block: any) => {
                    blocks.push(block);

                    counter++;
                    if (counter % 5000 == 0) {
                        this.logFn(`Loading: ${counter} items so far.`);
                    }
                })
                .done((fullJson: any) => {
                    debugger;
                    this.logFn(`Finished loading ${counter} blocks in ${Date.now() - ms}ms`);
                    r();
                })
                .fail((err: any) => {
                    console.error(err);
                    this.logFn(err);
                });
        });
    }

This works well for small files but for big file after about 250k items, there is this error:

enter image description here

Luke Vo
  • 17,859
  • 21
  • 105
  • 181
  • 1
    JS engines have a string max-length constant ([512MB in V8](https://stackoverflow.com/questions/61271613/chrome-filereader-api-event-target-result/61316641#61316641)) But you can still read this file as text in a stream, simply call `File.stream()` and pass that to a TextDecoderStream where available, or even just pass each chunk of that stream to a normal TextDecoder, passing the `{ stream: true }` option in its `decode()` method. However for the JSON decoding... You'll have to write it yourself (or find someone who did). – Kaiido Aug 09 '21 at 01:19
  • @Kaiido Thanks for the limitation info. I knew about the Stream but like you said, the problem is I cannot find any JSON parser with stream/incomplete data yet beside Oboe. Unfortunately the way they work, it stops working after it exceeds the limit. – Luke Vo Aug 09 '21 at 06:39

0 Answers0