2

I'm trying to parse multiple gzipped json files that are contained in one zip file through an InputStream from a http connection.

I've managed to read the first file but not more. Sometimes it fails and does not read the whole (first) file. I have checked content-length header on the connection and it is the same even when I'm failing to read the whole file.

I'm using goole app engine which doesn't allow me to save files locally which most of the examples I've found are doing.

I'm using ZipArchiveInputStream from https://commons.apache.org/proper/commons-compress/ for the Zip file.

This is the most closely related question I've been able to find: How to read from file containing multiple GzipStreams

private static ArrayList<RawEvent> parseAmplitudeEventArchiveData(HttpURLConnection connection)
        throws IOException, ParseException {
    String name, line;
    ArrayList<RawEvent> events = new ArrayList<>();

    try (ZipArchiveInputStream zipInput =
                 new ZipArchiveInputStream(connection.getInputStream(), null, false, true);) {

        ZipArchiveEntry zipEntry = zipInput.getNextZipEntry();
        if (zipEntry != null) {

            try(GZIPInputStream gzipInputStream = new GZIPInputStream(connection.getInputStream());
            BufferedReader reader = new BufferedReader(new InputStreamReader(gzipInputStream))) {

                name = zipEntry.getName();
                log.info("Parsing file: " + name);

                while ((line = reader.readLine()) != null) {
                    events.add(parseJsonLine(line));
                }
                log.info("Events size: " + events.size());
            }
        }
    }
    return events;
}
Jitan
  • 21
  • 4
  • I wonder how this can work because you use the input stream from the connection for the GZIPInputStream. But what you really want is to read the data for the ZipArchiveInputStream and create a GZIPInputStream from this data. – Martin Mar 26 '16 at 17:43
  • @MartinKrüger yes I've been wondering the same... if I switch it out as you suggest I get an "IOException: Truncated ZIP file " – Jitan Mar 26 '16 at 17:51

1 Answers1

0

This works for me:

public class UnzipZippedFiles {

    public static void main(String[] args) throws IOException, ParseException {
        FileInputStream inputStream = new FileInputStream("/home/me/dev/scratchpad/src/main/resources/files.zip");
        unzipFile(inputStream);
    }

    private static void unzipFile(InputStream inputStream)
            throws IOException, ParseException {
        try (ZipArchiveInputStream zipInput =
                     new ZipArchiveInputStream(inputStream, null, false, true);) {

            ZipArchiveEntry zipEntry;

            while ((zipEntry = zipInput.getNextZipEntry()) != null) {
                System.out.println("File: " + zipEntry.getName());

                byte[] fileBytes = readDataFromZipStream(zipInput, zipEntry);

                ByteArrayInputStream byteIn = new ByteArrayInputStream(fileBytes);
                unzipGzipArchiveAndPrint(byteIn);
            }
        }
    }

    private static byte[] readDataFromZipStream(ZipArchiveInputStream zipStream, ZipArchiveEntry entry) throws IOException {
        byte[] data = new byte[(int) entry.getSize()];
        zipStream.read(data);

        return data;
    }

    private static void unzipGzipArchiveAndPrint(InputStream inputStream) throws IOException {
        System.out.println("Content:");
        try (GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
             BufferedReader reader = new BufferedReader(new InputStreamReader(gzipInputStream))) {

            String line;
            while ((line = reader.readLine()) != null) {
                System.out.println(line);
            }
        }
    }
}
Martin
  • 522
  • 4
  • 7
  • Problem I get with this is that entry.getSize is returning -1. I suppose there is something about the zip file I have that makes it like this but it works to extract it with the 'unzip' command from the terminal. – Jitan Mar 27 '16 at 09:17
  • What is ZipArchiveEntry? – Ege Kuzubasioglu Dec 25 '17 at 14:24