1

I have a zip file which contains zip files (which may themselves contain zip files).

parent.zip
|- child_1.zip
|  |- foo.txt
|
|- child_2.zip
|  |- bar.txt
|
|- baz.txt

Using ZipFile, I can get the ZipEntries of the parent zip file, and see the children (child_1.zip, child_2.zip, baz.txt), but I cannot find a way to examine the contents of those child zips (foo.txt, bar.txt) without inflating the parent zip.

Is this possible, or do I need to inflate parent.zip?

Drew Stevens
  • 382
  • 1
  • 12
  • The file index is located at the end at a zip-file. How will you jump around to a specific byte number in a zipped stream? – Thorbjørn Ravn Andersen Jan 03 '19 at 21:49
  • 1
    I'm not familiar with the way that zips work, but I suggest you peek at this: https://stackoverflow.com/questions/47208932/how-to-read-data-from-nested-zip-files-in-java-without-using-temporary-files – Cardinal System Jan 03 '19 at 21:51
  • Ziping zip files isn't really great design. You can't tell they're zip files except by guessing by using the extension, and examining the first few bytes of the stream. Java certainly can't do that for you. – markspace Jan 03 '19 at 21:57
  • @markspace Sensible ZIP files will begin with an entry which has a local header that starts with a magic number. Products wishing to avoid dodgy `ZIPs (includes executable ZIPs) will check for this. – Tom Hawtin - tackline Jan 03 '19 at 22:35
  • @markspace - you're not wrong, but it's how the customer is passing the data. – Drew Stevens Jan 04 '19 at 14:43

2 Answers2

1

One can use a zip file system using the jar:file: protocol:

            URI uri = new URI(
                "jar:file:/home/.../.../external.zip!/.../internal.zip!/");
            Map<String, ?> env = new HashMap<>();
            try (FileSystem zipfs = FileSystems.newFileSystem(uri, env)) {
                Path rootPath2 = zipfs.getPath("/");
                Files.walk(rootPath2).forEach(p -> {
                    System.out.printf("Path %s%n", p.toString());
                });
            }

For a recursive walk one has to create URIs with an added "!/", and do the recursion oneself.

Using Files one can copy files out and into of a zip file system. (Here I have some doubts.)

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • I couldn't get it working. Please check my question: https://stackoverflow.com/questions/56124117/how-to-read-a-file-from-a-nested-zip-file-using-java-nio – Uli May 14 '19 at 06:38
0

This isn't a problem with zip files themselves (though it is a horrific format), but the java.util.zip API, and probably zlib which it is typically implemented with.

ZipFile requires a File which it likes to memory map. If the "file" is actually a nested entry, that's not going to fly unless you copy it out, or have some OS-specific trick up your sleeve.

If the nested zip file is compressed within the outer zip file, random access is obviously out. You would need a different API anyway. However, java.util.zip does have ZipInputStream. Don't treat it as an InputStream - that's a typically strange subtyping arrangement. It does allow you to stream out entries, even if the archive is a compressed entry of the outer file.

(Roughly ZIP files work like this: At the end of the file is a central directory. In order to access the archive in a random access manner, you need to load the end of the file and read that in. It contains names, lengths, etc., as well as an offset to each entry in the file. The entries contain names, lengths, etc., and the actual file contents. No, they needn't be consistent, or have any kind of 1-1 correlation. May also contain other lies, such as the decompressed length being wrong or -1. Anyway, you can ignore the central directory and read the entries sequentially.

JARs add to the fun by adding an INDEX.LST and a META-INF/manifest.mf as the first entries of the file. The former contains an index, similar to the central directory, but at the front rather than the end. The latter may contain a listing of the files together with signatures. Executable zips and GIFARs (and I think similar, earlier discovered equivalents for Microsoft products) may have something stuffed in front of the zip, so you have to go in through the rear for those.)

A small demonstration program.

import java.io.*;
import java.util.zip.*;

interface Code {
    static void main(String[] args) throws Exception {
        ZipFile zipZip = new ZipFile("zip.zip.zip");
        ZipEntry zipEntry = zipZip.getEntry("zip.zip");
        if (zipEntry == null) {
            throw new Error("zip.zip not found");
        }

        InputStream zipIn = zipZip.getInputStream(zipEntry);
        ZipInputStream zip = new ZipInputStream(zipIn);
        for (;;) {
            ZipEntry entry = zip.getNextEntry();
            if (entry == null) {
                break;
            }
            System.err.println(entry.getName());
            new BufferedReader(new InputStreamReader(zip)).lines().forEach(l -> {
                System.err.println("> "+l);
            });
         }
    }
}
Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305