0

Scenario: I have code that calls a soap web service, gets an attachment which is a zip file. Then unzips it, goes through all the files, gets the one file I want, which is a csv file, and gets the content of the csv file:

public static void unzipTry2(AttachmentPart att) throws IOException, SOAPException {
    try (ZipInputStream zis = new ZipInputStream(att.getRawContent())) {
        byte[] buffer = new byte[1024];
        for (ZipEntry zipEntry = zis.getNextEntry(); zipEntry != null; zipEntry = zis.getNextEntry()) {
            if (zipEntry.isDirectory()) {
                continue;
            }
            if (!zipEntry.getName().equals("FileIwant.csv")) {
                continue; //if it's not the file I want, skip this file
            }
            System.out.println(zipEntry.getName());
            for (int len = zis.read(buffer); len > 0; len = zis.read(buffer)) {
                //System.out.write(buffer, 0, len);
                String testString = new String(buffer,0,len);
                processCSVString(testString);
            }

        }
    }
}

It works just fine. However the CSV file that I am getting only contains one line, which is expected now, but in the future it may contain multiple lines. Since it's a CSV file, I need to parse LINE BY LINE. This code also has to work for the case where the CSV file contains multiple lines, and that is where I am not sure if it works since there is no way to test that (I don't control the input of this method, that all comes from the web service).

Can you tell me if the inner for loop reads the content of the file LINE by LINE? :

            for (int len = zis.read(buffer); len > 0; len = zis.read(buffer)) {
                //System.out.write(buffer, 0, len);
                String testString = new String(buffer,0,len);
                processCSVString(testString);
            }
Bobulous
  • 12,967
  • 4
  • 37
  • 68
mchl45
  • 29
  • 6
  • 1
    Why not read the file from buffer into a temporary file, and then use a Reader on the temporary file to go line by line? That scales better instead of having to check everything on the buffer for a line break. – Compass Jul 10 '19 at 19:20
  • @Compass I'm expected everything in memory – mchl45 Jul 10 '19 at 20:11
  • If it has to be in memory, you should convert the byte array to a reader and process the reader line by line. https://www.baeldung.com/java-convert-byte-array-to-reader – Compass Jul 10 '19 at 20:22
  • @Compass when you say convert the "byte array" you mean the byte array that contains that one particular file that I want(the csv file)? How can I get that one byte array? – mchl45 Jul 10 '19 at 20:29
  • Instead of converting the buffer to a string, create a stream that reads from the buffer as bytes and use that output as a reader. – Compass Jul 10 '19 at 20:34
  • @Compass do you know why that inner for loop only interates once? Do you know if one iteration reads the whole CSV file? I don't really have much experience with array bytes and readers and streams and all that – mchl45 Jul 10 '19 at 20:54
  • Depending on how many bytes is in the CSV file, if it is less than 1024 bytes, then yes it can read it one loop. If you have many lines but they are still less than the buffer size you can end up processing multiple lines at once. If you don't want to use a reader at all, just store everything as a String and split on the line separator and do a for loop on that. – Compass Jul 10 '19 at 22:27

1 Answers1

0

BufferedReader is the Java "thing" which can read a Reader line-by-line. And the glue what you need is InputStreamReader. Then you can wrap the ZipInputStream as

BufferedReader br=new BufferedReader(new InputStreamReader(zis))

(preferably in a try-with-resources block), and the classic loop for reading from a BufferedReader looks like this:

String line;
while((line=br.readLine())!=null){
    <process one line>
}
tevemadar
  • 12,389
  • 3
  • 21
  • 49
  • but "zis" is the whole ZipInputStream, which contains several files inside. I just want the file "FileIwant.csv" – mchl45 Jul 10 '19 at 20:31
  • @mchl45, ZipInputStream re-initializes itself for each entry in the .zip file. So it is not a single stream, but a per-entry stream. When read to the end, it is just the end of the current entry, it will not mistakenly read all the remaining files - I assume that is your concern. – tevemadar Jul 10 '19 at 22:09
  • so every time I say zis.getNextEntry(), then zis points to the next file? – mchl45 Jul 10 '19 at 22:40
  • @mchl45 yes. While the docs of ZipInputStream is rather short, this is one thing it says [*Reads the next ZIP file entry and positions the stream at the beginning of the entry data.*](https://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipInputStream.html#getNextEntry()). The real thing to worry about is `close()`, as it closes the entire stream, not just for the "current" one, but here you wanted to extract a single file anyway. Also, sorry about the late reply, it was night here and I was sleeping. – tevemadar Jul 11 '19 at 08:50