-1

I have the following snippet and this is used to check if the given zip entry is a directory or not which works fine.There is an additional requirement given where I need to check the size of each image (its size should not be > 10 MB) inside the folder which is inside the zip file. I been going through some articles but couldn't get hold of a scenario similar to mine.

The example structure for a zip file would look like the one given below along with the folder and the images size inside them

XYZ.zip>023423>Bat1.jpg ->11MB 
XYZ.zip>023423>Bat2.jpg ->5MB 
XYZ.zip>023423>Bat3.jpg ->11MB
XYZ.zip>023423>Bat4.jpg ->10MB

Based on the scenario above, at the end of the execution I should able to get the Bat1 & Bat3 as output as their size is >10 MB.Kindly advise.

private void isGivenZipInFolderStructure(ExcelImportCronJobModel
     cronJob) {
            try {
                foldersInZip = new ArrayList<>();
                if(cronJob.getReferencedContent() !=null) {
                    final ZipInputStream zip = new ZipInputStream(this.mediaService.getStreamFromMedia(cronJob.getReferencedContent()));
                    ZipEntry entry = null;
                    while ((entry = zip.getNextEntry()) != null) {
                        LOG.info("Size of the entry {}",entry.getSize());
                        if(entry.isDirectory()) {
                            foldersInZip.add(entry.getName().split(BunningsCoreConstants.FORWARD_SLASH)[0]);
                        }
                    }
                }
            } catch (IOException e) {
                LOG.error("Error reading zip, e");
            }
        }
Karthik
  • 371
  • 3
  • 7
  • 30
  • [`ZipEntry#getSize`](https://docs.oracle.com/javase/10/docs/api/java/util/zip/ZipEntry.html#getSize()), *"Returns the uncompressed size of the entry data."* – MadProgrammer May 04 '21 at 01:19
  • Hi @MadProgrammer I did printed the size as you can see in mu snipper and below is what I have got in the console and I don't its giving the valid size ? .Not sure if am are missing anything. INFO [hybrisHTTP18] [ExcelImportValidator] Size of the entry -1 INFO [hybrisHTTP18] [ExcelImportValidator] Size of the entry 0 – Karthik May 04 '21 at 01:33
  • So a size of `-1` indicates that the size is unknown (as per the documentation), but `0` might suggest it's a folder entry or hasn't been compressed properly – MadProgrammer May 04 '21 at 01:36
  • So, I ran a really quick test with a zip file with multiple directories and files and didn't have any issues – MadProgrammer May 04 '21 at 01:45
  • Ah, okay, the issue is with `ZipInputStream`, apparently, if you use it, it will return `-1` for size, but `ZipFile` works fine - [have a look at this](https://stackoverflow.com/questions/11098253/getting-the-size-of-zipinputstream) for some more details – MadProgrammer May 04 '21 at 01:50
  • Hi @MadProgrammer that really helps, however as I am using hybris the uploaded zip file will be saved as a media and I tried something like this below FileInputStream fis = new FileInputStream(Config.getString("local.domain.url","").concat(cronJob.getReferencedContent().getURL2()));The output of that string would be like https://localhost:9002/medias/imageswithFolder.zip?context=bW ,if I hit this in browser ,I can able to download the file but once the above line is executed I am getting "Error reading zip, e" is there any other alternative solution that I can try ? Appreciate your help. – Karthik May 04 '21 at 03:08
  • Modified the code to below one FileInputStream fis = new FileInputStream("//Users/karnagar2//Desktop//logs//imageswithFolder.zip"); ZipInputStream zis = null; int size = fis.available(); System.out.println("size in KB : " + size/1024); zis = new ZipInputStream(fis); ZipEntry ze;while ((ze = zis.getNextEntry()) != null) { System.out.println(ze.getSize()); } out put is as follow as size in KB : 10015 0 -1 -1 So it doesn't seem to print the size of individual entry yet. – Karthik May 04 '21 at 03:31
  • Yep, that continues to be an issue with using `FileInputStream` this way - there's no way for the `ZipInputStream` to know ahead of time the uncompressed size of the data - this is the domain of the `ZipFile` – MadProgrammer May 04 '21 at 03:38

1 Answers1

0

As mentioned in the comments, the value of getSize is not set when reading from a ZipInputStream - unlike when using ZipFile. However you could try to scan the stream contents yourself and monitor the sizes of each entry.

This method scans any ZIP passed in as InputStream which could be derived from a file or other downloaded source:

public static void scan(InputStream is) throws IOException {
    System.out.println("==== scanning "+is);
    ZipEntry ze;

    // Uses ByteArrayOutputStream to determine the size of the entry
    ByteArrayOutputStream bout = new ByteArrayOutputStream();

    long maxSize = 10_000_000L;
    try (ZipInputStream zis = new ZipInputStream(is)) {
        while ((ze = zis.getNextEntry()) != null) {
            bout.reset();
            long size = zis.transferTo(bout);
            System.out.println(ze.getName()
                                +(ze.isDirectory() ? " DIR" : " FILE["+size+"]")
                                +(size  > maxSize ? " *** TOO BIG ***":"")
                                );
            if (size > maxSize) {
                //  entry which is too big - do something / warning ...
            } // else use content: byte[] content = bout.toByteArray();
        }
    }
}

This approach is not ideal for very large ZIP content, but it may be worth trying for your specific case - better to have a slow solution than none at all.

If there are really big entries in the ZIP you might also consider replacing the line long size = zis.transferTo(bout); with a call to your own method which does not transfer content but still returns the size - similar to implementation of InputStream.transferTo but commenting out the write().

DuncG
  • 12,137
  • 2
  • 21
  • 33