7

Goal:

  • In the browser, read a file from the users file system as base64 string
  • These files are up to 1.5GB

Issue:

  • The followig script works perfectly fine on Firefox. Regardless of the filesize.
  • On Chrome, the script works fine for smaller files (I've tested files of ~ 5MB size)
  • If you pick a bigger file (e.g. 400MB) the FileReader completes without an error or exception, but returns an empty string instead of the base64 string

Questions:

  • Is this a chrome bug?
  • Why is there neither an error nor an exception?
  • How can I fix or work around this issue?

Important:

Please note, that chunking is not an option for me, since I need to send the full base64 string via 'POST' to an API that does not support chunks.

Code:

'use strict';

var filePickerElement = document.getElementById('filepicker');

filePickerElement.onchange = (event) => {
  const selectedFile = event.target.files[0];
  console.log('selectedFile', selectedFile);

  readFile(selectedFile);
};

function readFile(selectedFile) {
  console.log('START READING FILE');
  const reader = new FileReader();

  reader.onload = (e) => {
    const fileBase64 = reader.result.toString();

    console.log('ONLOAD','base64', fileBase64);
    
    if (fileBase64 === '') {
      alert('Result string is EMPTY :(');
    } else {
        alert('It worked as expected :)');
    }
  };

  reader.onprogress = (e) => {
    console.log('Progress', ~~((e.loaded / e.total) * 100 ), '%');
  };

  reader.onerror = (err) => {
    console.error('Error reading the file.', err);
  };

  reader.readAsDataURL(selectedFile);
}
<!doctype html>
<html lang="en">

<head>
  <!-- Required meta tags -->
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  <!-- Bootstrap CSS -->
  <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0/dist/css/bootstrap.min.css" rel="stylesheet"
    integrity="sha384-wEmeIV1mKuiNpC+IOBjI7aAzPcEZeedi5yW5f2yOq55WWLwNGmvvx4Um1vskeMj0" crossorigin="anonymous">

  <title>FileReader issue example</title>
</head>

<body>

  <div class="container">
    <h1>FileReader issue example</h1>
    <div class="card">
      <div class="card-header">
        Select File:
      </div>
      <div class="card-body">
        <input type="file" id="filepicker" />
      </div>
    </div>

  </div>

  <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0/dist/js/bootstrap.bundle.min.js"
    integrity="sha384-p34f1UUtsS3wqzfto5wAAmdvj+osOnFyQFpp4Ua3gs/ZVWx6oOypYoCJhGGScy+8"
    crossorigin="anonymous"></script>
  <script src="main.js"></script>
</body>

</html>
manuell
  • 7,528
  • 5
  • 31
  • 58
tmuecksch
  • 6,222
  • 6
  • 40
  • 61
  • 1
    I would really advice you to use FormData for multiple file uploads, (with one file you can just send the file as is) there is no limit of how large file/blobs you can upload with that (and you don't need to chunk it)... When you use `reader.readAsDataURL` then you will waste lots of processing, RAM, Time & bandwidth. – Endless May 11 '21 at 10:28
  • @Endless thanks for your input. Under regular circumstanced I'd totally agree with you, but as stated in the question, I can't influence the API and have to go with base64 encoding. – tmuecksch May 12 '21 at 07:41
  • Regarding the how to workaround the issue, the only viable solution is to make the API accept binary data instead of a data:// URL. A theoretical one would be to stream-read that file, encode each chunk as base64 and pass them into a ReadableStream that would get uploaded to your server... except that POSTing ReadableStreams is still not possible... – Kaiido May 12 '21 at 08:02
  • @Kaiido Chrome supports posting ReadableStreams as body using the fetch api – Endless May 12 '21 at 10:14
  • @Endless with the Experimental Web platforms Features flag only no? (was just [trying this out](https://glitch.com/edit/#!/base64streamencoder?path=pages%2Findex.html%3A12%3A59) btw, and it seems to work with that flag on) but I miss time to write an answer... – Kaiido May 12 '21 at 10:15
  • Hmm, yea i think so. sometimes i forget that i have experimental flags on... – Endless May 12 '21 at 10:25
  • @Thanks for your very valuable input guys. I will try to get in touch with the creator of the API and try to convince him to change it based on your input. – tmuecksch May 12 '21 at 14:31

2 Answers2

4

Is this a chrome bug?

As I said in my answer to Chrome, FileReader API, event.target.result === "", this a V8 (Chrome's but also node-js's and others' JavaScript JS engine) limitation.
It is intentional and thus can't really qualify as "a bug".
The technicalities are that what actually fails here is to build a String of more than 512MB (less the header) on 64bit systems because in V8 all heap objects must fit in a Smi (Small Integer), (cf this commit).

Why is there neither an error nor an exception?

That, might be a bug... As I also show in my linked answer, we get a RangeError when creating such a string directly:

const header = 24;
const bytes = new Uint8Array( (512 * 1024 * 1024) - header );
let txt = new TextDecoder().decode( bytes );
console.log( txt.length ); // 536870888
txt += "f"; // RangeError

And in the step 3 of FileReader::readOperation, UAs have to

If package data threw an exception error:

  • Set fr’s error to error.
  • Fire a progress event called error at fr.

But here, we don't have that error.

const bytes = Uint32Array.from( { length: 600 * 1024 * 1024 / 4 }, (_) => Math.random() * 0xFFFFFFFF );
const blob = new Blob( [ bytes ] );
const fr = new FileReader();
fr.onerror = console.error;
fr.onload = (evt) => console.log( "success", fr.result.length, fr.error );
fr.readAsDataURL( blob );

I will open an issue about this, since you should be able to handle that error from the FileReader.

How can I fix or work around this issue?

The best is definitely to make your API end-point accepts binary resources directly instead of data:// URLs, which should always be avoided anyway.

If this is not doable, a solution "for the future", will be to POST a ReadableStream to your end-point, and do the data:// URL conversion yourself, on a stream from the Blob.

class base64StreamEncoder {
  constructor( header ) {
    if( header ) {
      this.header = new TextEncoder().encode( header );
    }
    this.tail = [];
  }
  transform( chunk, controller ) {
    const encoded = this.encode( chunk );
    if( this.header ) {
      controller.enqueue( this.header );
      this.header = null;
    }
    controller.enqueue( encoded );
  }
  encode( bytes ) {
    let binary = Array.from( this.tail )
        .reduce( (bin, byte) => bin + String.fromCharCode( byte ), "" );
    const tail_length = bytes.length % 3;
    const last_index = bytes.length - tail_length;
    this.tail = bytes.subarray( last_index );
    for( let i = 0; i<last_index; i++ ) {
        binary += String.fromCharCode( bytes[ i ] );
    }
    const b64String = window.btoa( binary );
    return new TextEncoder().encode( b64String );
  }
  flush( controller ) {
    // force the encoding of the tail
    controller.enqueue( this.encode( new Uint8Array() ) );
  }
}

Live example: https://base64streamencoder.glitch.me/

For now, you'd have to store chunks of the base64 representation in a Blob as demonstrated by Endless's answer.

However beware that since this is a V8 limitation, even the server-side can face issues with strings this big, so anyway, you should contact your API's maintainer.

Kaiido
  • 123,334
  • 13
  • 219
  • 285
  • FYI, I also tried to use `blob.stream()` and using Transformer. But the problem i stumble up on was how base64 is packed into 3 bytes or something like that. Here is a demonstration that your base64 produce wrong base64 https://jsfiddle.net/8vbeqypo/ The solution could be to use a async ReadableStream pull source that uses FileReader to read x amount of bytes instead of piping it via `blob.stream()` Then you get rid of the transformer and you will be able to use Streams in FF as well – Endless May 16 '21 at 09:07
  • 1
    Wow thanks @Endless I thought about doing that the other when I started playing with it and completely forgot today when writing this answer (I didn't even change the code I left at that time...) Fixed now. Also, regarding FF, they are far from supporting posting ReadableStreams, and I hope they're closer to get TransformStreams working. – Kaiido May 16 '21 at 11:26
  • you could transform the stream into a blob and post that... `new Response(readableStream).blob().then(uploadBlob)` – Endless May 16 '21 at 12:37
0

Here is a partial solution that transform a blob in chunks into base64 blobs... concatenates everything into one json blob with a pre/suffix part of the json and the base64 chunks inbetween

Keeping it as a blob allows browser to optimize the memory allocation and offload it to the disk if needed.

you could try to change the chunkSize to something larger, browser likes to keep smaller blob chunks in memory (one bucket)

// get some dummy gradient file (blob)
var a=document.createElement("canvas"),b=a.getContext("2d"),c=b.createLinearGradient(0,0,3000,3000);a.width=a.height=3000;c.addColorStop(0,"red");c.addColorStop(1,"blue");b.fillStyle=c;b.fillRect(0,0,a.width,a.height);a.toBlob(main);

async function main (blob) {
  var fr = new FileReader()
  // Best to add 2 so it strips == from all chunks
  // except from the last chunk
  var chunkSize = (1 << 16) + 2 
  var pos = 0
  var b64chunks = []
  
  while (pos < blob.size) {
    await new Promise(rs => {
      fr.readAsDataURL(blob.slice(pos, pos + chunkSize))
      fr.onload = () => {
        const b64 = fr.result.split(',')[1]
        // Keeping it as a blob allaws browser to offload memory to disk
        b64chunks.push(new Blob([b64]))
        rs()
      }
      pos += chunkSize
    })
  }

  // How you concatinate all chunks to json is now up to you.
  // this solution/answer is more of a guideline of what you need to do
  // There are some ways to do it more automatically but here is the most
  // simpliest form
  // (fyi: this new blob won't create so much data in memory, it will only keep references points to other blobs locations)
  const jsonBlob = new Blob([
    '{"data": "', ...b64chunks, '"}'
  ], { type: 'application/json' })

  /*
  // strongly advice you to tell the api developers 
  // to add support for binary/file upload (multipart-formdata)
  // base64 is roughly ~33% larger and streaming
  // this data on the server to the disk is almost impossible 
  fetch('./upload-files-to-bad-json-only-api', {
    method: 'POST',
    body: jsonBlob
  })
  */
  
  // Just a test that it still works
  //
  // new Response(jsonBlob).json().then(console.log)
  fetch('data:image/png;base64,' + await new Blob(b64chunks).text()).then(r => r.blob()).then(b => console.log(URL.createObjectURL(b)))
}

I have avoided to make base64 += fr.result.split(',')[1] and JSON.stringify since GiB of data is a lot and json shouldn't handle binary data anyway

Endless
  • 34,080
  • 13
  • 108
  • 131
  • Would be good to explain the [core issue](https://stackoverflow.com/questions/61271613/chrome-filereader-api-event-target-result). It's not sure the endpoint will be able to handle that data anyway (e.g if it's a node server, and reads the payload as text). Also `base64 += fr.result.split(',')[1]` wouldn't work anyway. – Kaiido May 12 '21 at 10:17
  • i tried `base64 += fr.result.split(',')[1]` and it works - I'm basically doing it already but with blobs instead. He explained it like it works in other browser so i figure the problem was not the server but rather chrome itself. – Endless May 12 '21 at 10:22
  • base64 += fr.result.split(',')[1] would throw when `base64` reach 512MB – Kaiido May 12 '21 at 10:36
  • Oh, i thought the issue was with the FileReader only – Endless May 12 '21 at 10:38
  • No the issue is the string max length only. It has to fit in a SMI And it a v8 limitation, which is why I said it would be good to note that the server side may also fail (even if it may not apply to OP now) – Kaiido May 12 '21 at 10:40
  • @Kaiido That is very useful information. Thank you for bringing that up. I wasn't aware of the maximum string length. I've relied on the maximum string length defined in the ECMAScript standard - which is vast. – tmuecksch May 12 '21 at 14:21