Guava Resources.readLines() for Zip/Gzip files

Question

I've found the Resources.readLines() and Files.readLines() to be helpfull in simplifiying my code.
The problem is that I often read gzip-compressed txt-files or txt-files in zip archives from URL's (HTTP and FTP).
Is there a way to use Guava's methods to read from these URL's too? Or is that only possible with Java's GZIPInputStream/ZipInputStream?

If you're on Java 8 then you can use `BufferedReader#lines()`. — Ben Manes, Aug 15 '15 at 06:59

score 4 · Accepted Answer · edited Jan 07 '22 at 18:04

You can create your own ByteSources:

For GZip:

public class GzippedByteSource extends ByteSource {
  private final ByteSource source;
  public GzippedByteSource(ByteSource gzippedSource) { source = gzippedSource; }
  @Override public InputStream openStream() throws IOException {
    return new GZIPInputStream(source.openStream());
  }
}

Then use it:

Charset charset = ... ;
new GzippedByteSource(Resources.asByteSource(url)).toCharSource(charset).readLines();

Here is the implementation for the Zip. This assumes that you read only one entry.

public static class ZipEntryByteSource extends ByteSource {
  private final ByteSource source;
  private final String entryName;
  public ZipEntryByteSource(ByteSource zipSource, String entryName) {
    this.source = zipSource;
    this.entryName = entryName;
  }
  @Override public InputStream openStream() throws IOException {
    final ZipInputStream in = new ZipInputStream(source.openStream());
    while (true) {
      final ZipEntry entry = in.getNextEntry();
      if (entry == null) {
        in.close();
        throw new IOException("No entry named " + entry);
      } else if (entry.getName().equals(this.entryName)) {
        return new InputStream() {
          @Override
          public int read() throws IOException {
            return in.read();
          }

          @Override
          public void close() throws IOException {
            in.closeEntry();
            in.close();
          }
        };
      } else {
        in.closeEntry();
      }
    }
  }
}

And you can use it like this:

Charset charset = ... ;
String entryName = ... ; // Name of the entry inside the zip file.
new ZipEntryByteSource(Resources.asByteSource(url), entryName).toCharSource(charset).readLines();

`GzipInputStream` should be `GZIPInputStream` – nezda Dec 17 '17 at 15:06 — nezda, Dec 17 '17 at 15:06

score 1 · Answer 2 · answered Aug 14 '15 at 17:30

As Olivier Grégoire said, you can create the necessary ByteSources for whatever compression scheme you need in order to use Guava's readLines function.

For zip archives though, although it's possible to do it, I don't think it's worth it. It will be easier to make your own readLines method that iterates over the zip entries and reads the lines of each entry on your own. Here's a class that demonstrates how to read and output the lines of a URL pointing at a zip archive:

public class ReadLinesOfZippedUrl {
    public static List<String> readLines(String urlStr, Charset charset) {
        List<String> retVal = new LinkedList<>();
        try (ZipInputStream zipInputStream = new ZipInputStream(new URL(urlStr).openStream())) {
            for (ZipEntry zipEntry = zipInputStream.getNextEntry(); zipEntry != null; zipEntry = zipInputStream.getNextEntry()) {
                // don't close this reader or you'll close the underlying zip stream
                BufferedReader reader = new BufferedReader(new InputStreamReader(zipInputStream, charset));
                retVal.addAll(reader.lines().collect(Collectors.toList())); // slurp all the lines from one entry
            }
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
        return retVal;
    }

    public static void main(String[] args) {
        String urlStr = "http://central.maven.org/maven2/com/google/guava/guava/18.0/guava-18.0-sources.jar";
        Charset charset = StandardCharsets.UTF_8;
        List<String> lines = readLines(urlStr, charset);
        lines.forEach(System.out::println);
    }
}

Guava Resources.readLines() for Zip/Gzip files

2 Answers2