I am trying to to extract files out of a nested zip archive and process them in memory.
What this question is not about:
How to read a zip file in Java: NO, the question is how to read a zip file within a zip file within a zip and so on and so forth (as in nested zip files).
Write temporary results on disk: NO, I'm asking about doing it all in memory. I found many answers using the not-so-efficient technique of writing results temporarily to disk, but that's not what I want to do.
Example:
Zipfile -> Zipfile1 -> Zipfile2 -> Zipfile3
Goal: extract the data found in each of the nested zip files, all in memory and using Java.
ZipFile is the answer, you say? NO, it is not, it works for the first iteration, that is for:
Zipfile -> Zipfile1
But once you get to Zipfile2, and perform a:
ZipInputStream z = new ZipInputStream(zipFile.getInputStream( zipEntry) ) ;
you will get a NullPointerException.
My code:
public class ZipHandler {
String findings = new String();
ZipFile zipFile = null;
public void init(String fileName) throws AppException{
try {
//read file into stream
zipFile = new ZipFile(fileName);
Enumeration<?> enu = zipFile.entries();
exctractInfoFromZip(enu);
zipFile.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
//The idea was recursively extract entries using ZipFile
public void exctractInfoFromZip(Enumeration<?> enu) throws IOException, AppException{
try {
while (enu.hasMoreElements()) {
ZipEntry zipEntry = (ZipEntry) enu.nextElement();
String name = zipEntry.getName();
long size = zipEntry.getSize();
long compressedSize = zipEntry.getCompressedSize();
System.out.printf("name: %-20s | size: %6d | compressed size: %6d\n",
name, size, compressedSize);
// directory ?
if (zipEntry.isDirectory()) {
System.out.println("dir found:" + name);
findings+=", " + name;
continue;
}
if (name.toUpperCase().endsWith(".ZIP") || name.toUpperCase().endsWith(".GZ")) {
String fileType = name.substring(
name.lastIndexOf(".")+1, name.length());
System.out.println("File type:" + fileType);
System.out.println("zipEntry: " + zipEntry);
if (fileType.equalsIgnoreCase("ZIP")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip
ZipInputStream z = new ZipInputStream(zipFile.getInputStream(zipEntry) ) ;
System.out.println("Opening ZIP as stream: " + name);
findings+=", " + name;
exctractInfoFromZip(zipInputStreamToEnum(z));
} else if (fileType.equalsIgnoreCase("GZ")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip
GZIPInputStream z = new GZIPInputStream(zipFile.getInputStream(zipEntry) ) ;
System.out.println("Opening ZIP as stream: " + name);
findings+=", " + name;
exctractInfoFromZip(gZipInputStreamToEnum(z));
} else
throw new AppException("extension not recognized!");
} else {
System.out.println(name);
findings+=", " + name;
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Findings " + findings);
}
public Enumeration<?> zipInputStreamToEnum(ZipInputStream zStream) throws IOException{
List<ZipEntry> list = new ArrayList<ZipEntry>();
while (zStream.available() != 0) {
list.add(zStream.getNextEntry());
}
return Collections.enumeration(list);
}