3

I have a very large zip file, 2.5gb, which is encrypted. I can't decrypt the entire file into memory and unzip there for production. So I'm trying to use streams to limit the amount of memory used.

I've hooked up the following to do it (error handling and stream closing left out for clarity):

SecretKeySpec keySpec = new SecretKeySpec(myKey "AES");
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");

FileInputStream fis = new FileInputStream(new File(pathToEncryptedFile));
CipherInputStream cis = new CipherInputStream(fis, cipher);

ZipInputStream zis = new ZipInputStream(new BufferedInputStream(cis));
ZipEntry ze = null;
while ((ze = zis.getNextEntry()) != null) {
    String filename = ze.getName();
    System.out.println("Found zip entry: " + filename);
}

This works for about 50% of my files, even though they're all zipped and encrypted the same way. The exception I'll get in the while() loop for the unzipping part:

java.util.zip.ZipException: unknown format (EXTSIG=f23f1090)
  at java.util.zip.ZipInputStream.readAndVerifyDataDescriptor(ZipInputStream.java:196)
  ...

If I decrypt the entire file to a byte buffer and write it to disk, then use ZipInputStream on the file, it works for all my test files.

It seems like the extra padding at the end of the encrypted file is causing some problems when trying to use streams, but I thought the "PKCS5Padding" specification would take care of that.

Thanks

user1219278
  • 1,859
  • 5
  • 22
  • 27
  • AFAIK Zip files should be seekable, given that they have the directory index at the end - have you tried running the same code on the same file without encryption? – Tassos Bassoukos Apr 28 '14 at 16:55
  • @TassosBassoukos not sure I understand you correctly - but the same decrypted file unzips perfectly when using ZipFile (completely read into memory), but ZipInputStream will fail on it. – user1219278 Apr 28 '14 at 17:41

1 Answers1

1

Use ZipInputStream on the decrypted file without reading it into memory. If that fails, your file cannot be read anyways, and needs to be recreated (could be that it's slightly non-standard). If it succeeds, write out the results of the decryption stream (before passing it to ZipInputStream) and check for binary differences.

Tassos Bassoukos
  • 16,017
  • 2
  • 36
  • 40
  • Yes decrypting to a file, then feeding the file to ZipInputStream works fine. I have a feeling it's related to the padding. The encryption method server-side adds some padding at the end if the output length is not a multiple of 16. Maybe that's messing up the operation when trying to do everything via streams? – user1219278 Apr 28 '14 at 21:35
  • The full directory of a zip archiveis at the end of the file, so if there's padding left it will cause you issues. – Tassos Bassoukos Apr 28 '14 at 23:14
  • If accessing as a stream, does it matter if the directory is at the end? The ZipInputStream is just reading entries from start to end. The exception is thrown after extracting a few entries. – user1219278 Apr 28 '14 at 23:36
  • Ok it turns out that it had nothing to do with the decryption. I removed that from the workflow. ZipInputStream just plain fails on some of my zips, ZipFile works every time. I tried apache's ZipArchiveInputStream, and that could handle the ones that ZipInputStream could not. – user1219278 Apr 30 '14 at 03:13