I have a valid zip file on my classpath (Java 8). It is 302617 bytes long. I would like to copy it to a temp folder for expansion and further processing in my application, using standard ApacheCommons IO Utils. If I read it as a file, e.g.:
File out = new File("out.zip");
File in = new File ("src/main/resources/StartUpData/c4.zip");
try (InputStream is = new FileInputStream(in);
FileOutputStream fos = new FileOutputStream(out) ) {
IOUtils.copy(is, fos);
System.out.println(out.length());
}
this works exactly as expected - printing out 302617.
If however I read from classpath input stream:
try (InputStream is2 = this.getClass().getResourceAsStream("/StartUpData/c4.zip");
FileOutputStream fos = new FileOutputStream(out)) {
IOUtils.copy(is2, fos);
System.out.println(out.length());
}
it generates a file of 544115 bytes. It is not valid zip format, it cannot be unzipped or read as a zip file by any command line zip utils or Java. I only observe this behaviour with zip files; for other binary files or images both approaches work fine.
I investigated the bytes being read in both cases. Here are first 12 bytes of the file, from xxd -b c4.zip
:
00000000: 01010000 01001011 00000011 00000100 00010100 00000000 PK....
00000006: 00001000 00001000 00001000 00000000 10111010 10011110 ......
The 11th and 12 bytes in the file (10111010 10011110, hex ba 9e) are read from the classpath input stream as hex ef bf.
In fact, any byte with the first bit set to 1 is misread by the input stream created by
this.getClass().getResourceAsStream("/StartUpData/c4.zip")
Does anyone know why this happens only for zip files being read from the classpath? How can 10111010 10011110 be interpreted as ef bf ? Many thanks for any suggestions. I am using MacOS High Sierra, my colleague also observes this behaviour on Windows 10.