0

I know that when I write a new file to a folder that ends in ".zip" it compresses the file. This is when using BufferedOutputStream in JAVA and saving to a windows file system. I'm saving these files to a network drive, so the write time is dependent on network speed.

Will saving to a .zip folder speed up write time? In other words, does it transfer the data uncompressed and then compresses it (so it wouldn't speed up write time) or does it compress then write out the file? Sorry if this is an ignorant question.

Chris
  • 49
  • 4
  • 1
    Try it and see, by writing some data with and without compression, and measuring the time. – nanofarad May 28 '16 at 00:18
  • A good way minimize network load is to use [GZIPOutputStream](https://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html) to compress the data sending it over the network, then uncompress it explicitly with [GZIPInputStream](https://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPInputStream.html) on the other side as you read the stream. (GZIP compresses better than ZIP). – Majora320 May 28 '16 at 00:19
  • 3
    Just using ".zip" as the file extension doesn't automatically compress the contents... You need to explicitly compress the file from Java. – SamTebbs33 May 28 '16 at 00:19
  • When I view the folder in windows, it says it's compressed. It gives the actual size and the compressed size. – Chris May 28 '16 at 00:21
  • BufferedOutputStream does not compress. The .zip folder may be a NTFS compressed folder, perhaps? That'd mean you're sending it uncompressed over the network, and the host will compress it on the fly. That'll probably make it *slower*, if anything. The data transfer over the network would take as much time as before, since it's still sending uncompressed bytes. The host that writes the file now compresses it first, then writes the compressed bytes. It's quite likely the compression will be the bottleneck and costs more time than the time that's saved in the write operation. – Arjan May 28 '16 at 00:47
  • Furthermore, the total time of the write operations depends on so many things. Where do you do the compression? On the pc that sends the file, on the host that writes it? Do they have fast CPU's? What's the available bandwidth of the network? What's the write speed of the storage device? So I'm voting to close this question. – Arjan May 28 '16 at 01:05
  • @Arjan - Someone with rep 125 does not have rights to "vote to close". – Stephen C May 28 '16 at 02:04
  • @StephenC Correct, I meant flagging it. Thank you. – Arjan May 28 '16 at 02:06

1 Answers1

2

There are so many misconceptions in the Question, I think it is worth going through them one at a time.

I know that when I write a new file to a folder that ends in ".zip" it compresses the file.

That is not correct. Creating a file with a ".zip" suffix does not automatically make it compressed. Writing files to a directory that has ".zip" as its filename suffix (?!?) doesn't either. Not in Java. Not in other languages.

In order to get compression, the application needs to take steps to make this happen. In Java you could use ZipOutputStream to write a file in ZIP file format. However, a ZIP file is actually an "archive" format that is designed to hold multiple files in a ZIP file. If you simply trying to compress a single file, there are better alternatives; e.g. GZIPOutputStream.

(It is also possible that this so-called "ZIP folder" you are talking about is a normal ZIP file that has been "mounted" as a loopback file system. You / someone else would have had to set that up explicitly. Anyhow, if this is what is going on here, it is nothing to do with Java. It is all happening in external software and in the operating system where the ZIP is "mounted".)

This is when using BufferedOutputStream in JAVA and saving to a windows file system.

Erm ... no. See above. However you are correct that it may be better to use a BufferedOutputStream to write files, though it only really helps if your application is writing the files in small chunks; e.g. a byte at a time. (Stream compression complicates the issue, so it is difficult to give a simple, general answer on this.)

I'm saving these files to a network drive, so the write time is dependent on network speed.

Correct. It is also dependent on network latency, the protocols used and the load on the remote file server. (If you have a ZIP "mounted", then that is going to add overheads too.)

Will saving to a .zip folder speed up write time?

Maybe. See above. It depends what you mean by a ZIP folder.

Ignoring that, writing the files (the right way) in compressed and / or archive form from Java may speed up writes. There are actually two things to consider:

  • For plain compression, you are trading off the time it takes the application (!!) to compress and decompress the data against the time (and disk space) you are saving by moving and storing less bytes.

  • For ZIP files (and similar archive formats) there is a second potential saving. Storing and retrieving lots of individual small files from a file system is slow compared with storing and retrieving a single ZIP file containing those files.

And if you are looking for optimal compression, then ZIP is not the best option.

In other words, does it transfer the data uncompressed and then compresses it (so it wouldn't speed up write time) or does it compress then write out the file?

There are so many variables that it is hard to say for sure. But unless you have done something odd, it is likely that the bytes are sent over the network in compressed form.


Finally, I would advise you NOT to try to combine mounted ZIP files and network shares:

  • The combination of the two could potentially interact in ways that makes performance worse.

  • There is a risk that you will end up with a corrupted ZIP or lost files if the network share goes offline at an inconvenient point.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216