1

I am downloading a large number of files using the following method and I am concerned about its memory usage.

Chrome's Blob Storage System Design documentation mentions the following.

If the in-memory space for blobs is getting full, or a new blob is too large to be in-memory, then the blob system uses the disk. This can either be paging old blobs to disk, or saving the new too-large blob straight to disk.

However, even after going through the documentation multiple time I still have the following concerns:

  1. I am still unsure where the use of fetch affect this behavior and load data into the memory first.
  2. If fetch in fact alters this behavior is there a filesize limit that is recommended for this method (and any files beyond that size shouldn't be downloaded)?
  3. What would the behavior be in other (non-chromium-based browsers)?
const download = downloadLinks => {

  const _download = async ( downloadLink ) => {

    const blobURL = await fetch(downloadLink, {  
      responseType: 'blob'  
    })
    .then(res => res.blob())
    .then(blob => window.URL.createObjectURL(blob))
 
    const fileName = downloadLink.substr(downloadLink.lastIndexOf('/'))
    
    const a = document.createElement('a')  
    a.href = blobURL
    a.setAttribute('download', fileName)  
    document.body.appendChild(a)  
    a.click()
    a.remove()  
    
    window.URL.revokeObjectURL(blobURL)
  }

  const downloadInterval = () => {

    if (downloadLinks.length == 0) return

    const url = downloadLinks.pop()
    
    _download(url)
    
    if (downloadLinks.length !== 0) setTimeout(downloadInterval, 500)

  }

  setTimeout(downloadInterval, 0)
}

Here are some of the resources that I went through. These answer part of all three of these questions, but I am a little too concerned about how fetch might affect if the Blob is first loaded into the memory or not.

Zhang Chandra
  • 81
  • 1
  • 6
  • Why do you even go through fetch? If I read your code correctly, what it does is to first fetch a resource at some address, to then generate a blob URI pointing to that fetched data. Given this process is limited to same-origin policy, why not make directly your anchor point to the resource to be fetched? No need for a blob:// URI at all here. (Also, if you are really curious, when you do `fetch(url).then(r => r.blob())`, the whole data has to be fetched as a ReadableStream, and stored inside an ArrayBuffer. Only when the whole request is complete, the ArrayBuffer will get copied in a Blob.) – Kaiido Jun 01 '21 at 04:22
  • @Kaiido Firstly, this method ensures that browsers download filetypes like txt, pdf, mp4, HTML, etc instead of opening them. Secondly, this ensures that we can loop through (or set interval in this case) an array of downloadLinks and create anchor elements of the blob URLs. It turns out that in you can't do that with `http[s]://*` type URLs while you can do it with `blob://*` type URLs. I have talked about this behavior in https://stackoverflow.com/questions/66666415/how-to-stop-automatic-cancellation-of-downloads-when-downloading-multiple-files – Zhang Chandra Jun 02 '21 at 13:17
  • @Kaiido I also noticed that using fetch API isn't the best way to do what I am trying to do. As you pointed out, fetch first loads data into the memory as ReadableStream and only then converts it into Blob object. Therefore, I am now using xhr client. Setting `xhr.responsetype = 'blob'` ensures that the data is fetched/loaded as blob. – Zhang Chandra Jun 02 '21 at 13:22

1 Answers1

0

The short answer is yes!

  1. Fetch in fact changes this behavior since it first loads the data as ReadableStream which is loaded into the memory. Therefore, use the following code instead.

  2. The largest file size that this method can download is dependent on the disk size, OS, and browser. There isn't an exact number that works for all systems. This question has been answered in detail here.

  3. "No apparent hard limit. I am able to create Blob's significantly larger than the "800 MiB" FileSaver.js claims. It does not use disk space to back larger blobs, so it all goes in memory, potentially with the operating system paging memory to disk. This means that blobs larger than memory may be possible, though likely with bad performance."

    Follow the link in number 2 and here for more details.

As @kaiido mentions in the comment and something I also discovered by running a few tests, if you are expecting a large file and what to take advantage of Blob architecture (and load files directly in the disk, if possible) the above code can be modified as follows.

const download = downloadLinks => {

    const _download = url => {

        const xhr = new XMLHttpClient()
        xhr.responseType = 'blob'
        xhr.open('GET', url)
        xhr.onload = () => {
            const fileName = url.substr(url.lastIndexOf('/'))
            const blobURL = window.URL.createObjectURL(xhr.response)
            const a = document.createElement('a')  
            a.href = blobURL
            a.setAttribute('download', fileName)  
            document.body.appendChild(a)  
            a.click()
            a.remove()  
            window.URL.revokeObjectURL(blobURL)
        }
        xhr.send(null)

    }

    const downloadInterval = () => {

        if (downloadLinks.length == 0) return

        const url = downloadLinks.pop()
        
        _download(url)
        
        if (downloadLinks.length !== 0) setTimeout(downloadInterval, 500)

    }

    setTimeout(downloadInterval, 0)
}

The difference here is made by the line xhr.responseType = 'blob'. Although it can be seen in the original question that our request object has a responseType option, it doesn't work since fetch API doesn't have that option in the first place.

Zhang Chandra
  • 81
  • 1
  • 6