4

Problem: I need to download lot of files(size of file can be upto 2GB) and send those as zip to the client requesting for resource. I was looking to parallelize that operation. Currently the library I've to use takes OutputStream (in my case it's ZipOutputStream) as input and write content of downloaded files to that output stream using ApacheCommons IOUtils.

Problem with this approach is everything happens sequentially. I've to download one file and then write to OutputStream, create a new zipEntry, close that entry and then go next one. I wanted to do the download operation in parallel and if possible I need write to ZipOutputStream in parallel as well.

Anyone has faced this thing before?

I was planning to change signature of that library to return InputStream of the downloaded resource so that I can get those content in parallel and then creating new ZipEntry for each of those stream in sequence. This way I would at least be able to download the file in parallel. The way I'm creating ZipOutStream is as follows:

ZipOutputStream zo = new ZipOutputStream(httpservletResponse.getOutputStream);

Any thoughts?

User5817351
  • 989
  • 2
  • 16
  • 36
  • 1
    Because you don't know if the underlying format can support concurrent writes, OutputStream (and ZipOutputStream) requires you to write sequentially. – Kylar Aug 25 '16 at 17:49
  • Is there any other way to make that in parallel, may be not using ZipOutputStream and using other stream which can do that in parallel? – User5817351 Aug 25 '16 at 17:56
  • Use a consumer-producer pattern: the consumer gets downloaded files from a blocking queue and writes them to the output stream, while the consumers, in other threads, download the files and put them in the queue. – JB Nizet Aug 25 '16 at 18:21
  • 1
    Would [that](http://www.pixeldonor.com/2013/oct/12/concurrent-zip-compression-java-nio/) help? – ikaerom May 30 '17 at 20:56
  • Why? The network isn't parallel, and the operation is output-bound. There is no advantage to this complication. – user207421 Jun 14 '20 at 03:10
  • looks like there isn't a way to write in parallel, you can save some time by preparing files to be sent in parallel, get a lock and send data to avoid concurrency exception. – suhas0sn07 Aug 31 '21 at 04:45
  • @user207421 In a modern datacenter, the time to send the file (network) is way faster that the time to compress it (cpu) by a factor of x10 or more. There would be tremendous advantages to adding more cpu threads. – Darren Aug 15 '22 at 13:30

0 Answers0