I'm trying to parse multiple gzipped json files that are contained in one zip file through an InputStream from a http connection.
I've managed to read the first file but not more. Sometimes it fails and does not read the whole (first) file. I have checked content-length header on the connection and it is the same even when I'm failing to read the whole file.
I'm using goole app engine which doesn't allow me to save files locally which most of the examples I've found are doing.
I'm using ZipArchiveInputStream from https://commons.apache.org/proper/commons-compress/ for the Zip file.
This is the most closely related question I've been able to find: How to read from file containing multiple GzipStreams
private static ArrayList<RawEvent> parseAmplitudeEventArchiveData(HttpURLConnection connection)
throws IOException, ParseException {
String name, line;
ArrayList<RawEvent> events = new ArrayList<>();
try (ZipArchiveInputStream zipInput =
new ZipArchiveInputStream(connection.getInputStream(), null, false, true);) {
ZipArchiveEntry zipEntry = zipInput.getNextZipEntry();
if (zipEntry != null) {
try(GZIPInputStream gzipInputStream = new GZIPInputStream(connection.getInputStream());
BufferedReader reader = new BufferedReader(new InputStreamReader(gzipInputStream))) {
name = zipEntry.getName();
log.info("Parsing file: " + name);
while ((line = reader.readLine()) != null) {
events.add(parseJsonLine(line));
}
log.info("Events size: " + events.size());
}
}
}
return events;
}