0

I have a requirement to display various files to users, primarily as plain text. However, users have the ability to upload arbitrary files, including binary files. Some of these files can be quite large, up to 10 MB in size.

The problem happens when I am attempting to load a heavy binary file into an HTML - it hangs up the entire page for about a minute.

I've been able to replicate this issue on codepen. It is very simple, for HTML:

<button id="load_json"> Load JSON </button>
<button id="load_binary"> Load Binary </button>

<pre id="display"></pre>

and for JS:

const createDataLoader = (url) => async () => {
  const response = await fetch(url);
  const text = await response.text();
  display.innerText = text;
};

load_json.addEventListener(
  "click",
  createDataLoader(
    "https://raw.githubusercontent.com/HKGx/repro/main/10mb-sample.json"
  )
);

load_binary.addEventListener(
  "click",
  createDataLoader("https://raw.githubusercontent.com/HKGx/repro/main/file.bin")
);

Additionally, I've uploaded example files to github.

When presenting text-based files such as JSON or CSV, the display process is pretty smooth, taking just a second or two to render without noticeable stutters.

However, this smooth behavior changes when dealing with binary files. I've found only a partial solution by avoiding loading of the entire file, although it is still not perfect. It has a choppy feeling when scrolling through binary files (which doesn't happen with normal text files).

EDIT: I have noticed that the above works SIGNIFICANTLY better on Firefox. It is taking only a few seconds instead of minutes.

HKG
  • 334
  • 3
  • 9
  • I am pretty convinced you are also going to have this issue if you load in a 10Mb JSON file and display it - it's just a lot for the renderer to process? – somethinghere Aug 17 '23 at 13:18
  • 1
    @somethinghere, The provided JSON file is 8MBs (even though the name states 10mb) in size and it doesn't break the renderer. Before actually pushing it we tested it with multiple large XMLs, JSONs, and Lorem ipsums. – HKG Aug 17 '23 at 13:21
  • 2
    You have to "Paginate" the data in some way. Maybe write a proxy which can give you a chunk of the data at a time? And then some js code to support it and show only a chunk of that info at a time. I'm not sure if that will go against the business requirements. – Shuvojit Aug 17 '23 at 13:21
  • 2
    Note that to paginate, you can store your data as a Blob, and then `.slice()` it and get only the `.text()` of that chunk. – Kaiido Aug 17 '23 at 13:22
  • Yes, I'm currently paginating the files but I want to believe that there is a better way to handle this. Paginated binary file is still slower to render when compared to just a text file. @Shuvojit – HKG Aug 17 '23 at 13:24
  • I'm just trying to understand the slowness. Some piece of code coverts the binary into renderable utf right? Is that code introducing the slowness? – Shuvojit Aug 17 '23 at 13:27
  • "When presenting text-based files such as JSON or CSV, the display process is pretty smooth, taking just a second or two to render without noticeable stutters" - your JSON is also 100 times smaller than your binary. – mbojko Aug 17 '23 at 13:28
  • 1
    @mbojko it is not, the binary file is 5_120 kbs in size, the json file is 8_182 kbs in size. You might have mistaken that fact that the JSON file might be served gzipped. – HKG Aug 17 '23 at 13:32
  • @Shuvojit when I tried using any kind of profiler, it was just telling me that the "rendering" of the browser is taking a long time. At first I thought it might be related to some layout thrashing (because in background we had a few synchronous layout calls), but after trying to fix the thrashing it didn't improve in the slightest and it was still freezing the entire page. – HKG Aug 17 '23 at 13:32
  • 1
    With arbitrary binary data, you are probably feeding the browser with _a lot_ of code sequences that are not valid in the current character encoding, which it has to replace with the � character ... so I'm guessing that's probably what is so time consuming here. Maybe trying to "filter" the fetched data in that regard - looking for invalid byte sequences, and replacing those with the actual � _character_, before you assign this as the innerText, might help ...? – CBroe Aug 17 '23 at 13:54
  • Actually, `response.text()` is done quite fast in both cases, _displaying_ it is what takes ages. Might be the browser has under the hood some optimisation mechanisms for displaying ASCII (-ish) texts, which don't work for binary file converted to text? – mbojko Aug 17 '23 at 14:04
  • 1
    It would have to load more fonts from the system for sure, but that shouldn't be *that* noticeable. FWIW, I can repro on Chrome but not on Firefox, looks like a bug – Kaiido Aug 17 '23 at 14:23
  • 1
    That would be something related to the text layout, changing your `
    ` to `white-space: normal` it gets responsive about ten times faster (very rough measurement) https://jsfiddle.net/Lzktmceu/ Even though it's still a lot slower than other browsers. You may want to open an issue at https://bugs.chromium.org/p/chromium/issues/wizard (It still reproduces in the latest Canary) I'd suspect the width of the rendering canvas would be for something, but that's just guesswork.
    – Kaiido Aug 17 '23 at 14:45
  • Thank you @Kaiido for further checking :) I opened an issue at https://bugs.chromium.org/p/chromium/issues/detail?id=1473680 – HKG Aug 17 '23 at 15:38
  • And to confirm a bit that it's related to the canvas size, disabling Hardware Acceleration makes Chrome more in par with Firefox. – Kaiido Aug 18 '23 at 00:05

0 Answers0