I've found the Resources.readLines() and Files.readLines() to be helpfull in simplifiying my code.
The problem is that I often read gzip-compressed txt-files or txt-files in zip archives from URL's (HTTP and FTP).
Is there a way to use Guava's methods to read from these URL's too? Or is that only possible with Java's GZIPInputStream/ZipInputStream?
Asked
Active
Viewed 2,774 times
2

user1775213
- 491
- 1
- 6
- 11
-
1If you're on Java 8 then you can use `BufferedReader#lines()`. – Ben Manes Aug 15 '15 at 06:59
-
Ping! I've added a `ByteSource` for Zip in my answer. – Olivier Grégoire Aug 17 '15 at 10:18
2 Answers
4
You can create your own ByteSource
s:
For GZip:
public class GzippedByteSource extends ByteSource {
private final ByteSource source;
public GzippedByteSource(ByteSource gzippedSource) { source = gzippedSource; }
@Override public InputStream openStream() throws IOException {
return new GZIPInputStream(source.openStream());
}
}
Then use it:
Charset charset = ... ;
new GzippedByteSource(Resources.asByteSource(url)).toCharSource(charset).readLines();
Here is the implementation for the Zip. This assumes that you read only one entry.
public static class ZipEntryByteSource extends ByteSource {
private final ByteSource source;
private final String entryName;
public ZipEntryByteSource(ByteSource zipSource, String entryName) {
this.source = zipSource;
this.entryName = entryName;
}
@Override public InputStream openStream() throws IOException {
final ZipInputStream in = new ZipInputStream(source.openStream());
while (true) {
final ZipEntry entry = in.getNextEntry();
if (entry == null) {
in.close();
throw new IOException("No entry named " + entry);
} else if (entry.getName().equals(this.entryName)) {
return new InputStream() {
@Override
public int read() throws IOException {
return in.read();
}
@Override
public void close() throws IOException {
in.closeEntry();
in.close();
}
};
} else {
in.closeEntry();
}
}
}
}
And you can use it like this:
Charset charset = ... ;
String entryName = ... ; // Name of the entry inside the zip file.
new ZipEntryByteSource(Resources.asByteSource(url), entryName).toCharSource(charset).readLines();

Jherico
- 28,584
- 8
- 61
- 87

Olivier Grégoire
- 33,839
- 23
- 96
- 137
1
As Olivier Grégoire said, you can create the necessary ByteSource
s for whatever compression scheme you need in order to use Guava's readLines
function.
For zip archives though, although it's possible to do it, I don't think it's worth it. It will be easier to make your own readLines
method that iterates over the zip entries and reads the lines of each entry on your own. Here's a class that demonstrates how to read and output the lines of a URL pointing at a zip archive:
public class ReadLinesOfZippedUrl {
public static List<String> readLines(String urlStr, Charset charset) {
List<String> retVal = new LinkedList<>();
try (ZipInputStream zipInputStream = new ZipInputStream(new URL(urlStr).openStream())) {
for (ZipEntry zipEntry = zipInputStream.getNextEntry(); zipEntry != null; zipEntry = zipInputStream.getNextEntry()) {
// don't close this reader or you'll close the underlying zip stream
BufferedReader reader = new BufferedReader(new InputStreamReader(zipInputStream, charset));
retVal.addAll(reader.lines().collect(Collectors.toList())); // slurp all the lines from one entry
}
} catch (IOException e) {
throw new UncheckedIOException(e);
}
return retVal;
}
public static void main(String[] args) {
String urlStr = "http://central.maven.org/maven2/com/google/guava/guava/18.0/guava-18.0-sources.jar";
Charset charset = StandardCharsets.UTF_8;
List<String> lines = readLines(urlStr, charset);
lines.forEach(System.out::println);
}
}

heenenee
- 19,914
- 1
- 60
- 86