Efficient way to read a small file from a very large Zip file in Java

Question

I was wondering if there is any efficient way to read a small file from a very large zip. I am not interested in any other file in the zip except a small file called inventory.xml.

To be exact, the zip file resides in artifactory. So I don't want to download entire file into my disk as well. Here is what I have now.

URL url = new URL("artifactory-url");
    HttpURLConnection con = (HttpURLConnection) url.openConnection();
    con.setRequestMethod("GET");
    int status = con.getResponseCode();
    if (status != 200) {
        System.out.println("Unable to find the artifact : " + url.toString());
        return bugs;
    }
    try (ZipInputStream zipStream = new ZipInputStream(con.getInputStream())) {
        ZipEntry entry;
        while ((entry = zipStream.getNextEntry()) != null) {
            if (entry.getName().contains("inventory.xml")) {
                //do something here
            }

        }
    }

Another query is, If I know the co-ordinates of the file, would it help?

@DhanasekaranDon it loops through each zipentry and if its a huge zip file with a lot entries, it takes a good amount of time. — grigory mendel, Oct 22 '21 at 04:18
Nothing you can do about that. It's a sequential format. Make sure you break out of the loop after you've dealt with the file you want. — user207421, Oct 22 '21 at 04:25
`con.getInputStream()` gets the complete file so you cannot just get parts of the file. So as @user207421 said, you cannot do anything about that. — Renis1235, Oct 22 '21 at 06:34
@Renis1235 `getInputStream()` returns the input stream. You then have to read data from it, as much as you need. You haven't downloaded the whole file. Just the part you read. — user207421, Oct 22 '21 at 08:21
@user207421 I am breaking out if i find the required file. the problem is there are huge files which doesn't have the particular file which I am looking for. So its kind of looping until the end for that file. — grigory mendel, Oct 22 '21 at 13:57
@grigorymendel Very good, so there is nothing you can do, except find a way not to download files that don't have what you're looking for in them, or have the individual files available at the server instead of a huge ZIP file, which isn't much use to anyone really. — user207421, Oct 22 '21 at 23:34

score 1 · Answer 1 · answered Oct 22 '21 at 15:58

ZIP files store their directory at the end of the file, so if you have some way to randomly access the contents of the file you could do this.

However, that's a big "if": it requires Artifactory to support byte-range GETs, and requires you to re-implement (or find/adapt) the code to read the directory structure and retrieve a file from the middle of the archive.

If you need to do this frequently, a far better solution is to change the process that puts those files in Artifactory in the first place. If it's packaged in a JAR that's produced by Maven or another build tool, then it's a simple matter of extracting the files into their own dependency.

score 0 · Accepted Answer · answered Nov 10 '21 at 09:22

0

As many of you mentioned, the code mentioned in the question itself is probably the most efficient solution. Thank you for the help anyway.

answered Nov 10 '21 at 09:22

grigory mendel

111
9

Efficient way to read a small file from a very large Zip file in Java

2 Answers2