I am working with RDF models at the moment. Therefore I query data from a database, generate models using Apache Jena and work with them. Although, I don't want to have to query the models every time I use them, so I thought about storing them locally. The models are quite big, so I'd like to compress them using Apache Commons Compress. This works so far (try-catch-blocks omitted):
public static void write(Map<String, Model> models, String file){
logger.info("Writing models to file " + file);
TarArchiveOutputStream tarOutput = null;;
TarArchiveEntry entry = null;
tarOutput = new TarArchiveOutputStream(new GzipCompressorOutputStream(new FileOutputStream(new File(file))));
for(Map.Entry<String, Model> e : models.entrySet()) {
logger.info("Packing model " + e.getKey());
// Convert Model
ByteArrayOutputStream baos = new ByteArrayOutputStream();
RDFDataMgr.write(baos,e.getValue(), RDFFormat.RDFXML_PRETTY);
// Prepare Entry
entry = new TarArchiveEntry(e.getKey());
entry.setSize(baos.size());
tarOutput.putArchiveEntry(entry);
// write into file and close
tarOutput.write(baos.toByteArray());
tarOutput.closeArchiveEntry();
}
tarOutput.close();
}
But as I try the other direction, I get weird NullPointerExceptions. Is this a bug in the GZip-Implementation or is my understanding of Streams wrong?
public static Map<String, Model> read(String file){
logger.info("Reading models from file " + file);
Map<String, Model> models = new HashMap<>();
TarArchiveInputStream tarInput = new TarArchiveInputStream(new GzipCompressorInputStream(new FileInputStream(file)));
for(TarArchiveEntry currentEntry = tarInput.getNextTarEntry();currentEntry != null; currentEntry= tarInput.getNextTarEntry()){
logger.info("Processing model " + currentEntry.getName());
// Read the current model
Model m = ModelFactory.createDefaultModel();
m.read(tarInput, null);
// And add it to the output
models.put(currentEntry.getName(),m);
tarInput.close();
}
return models;
}
This is the stack trace:
Exception in thread "main" java.lang.NullPointerException
at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:271)
at java.io.InputStream.skip(InputStream.java:224)
at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:106)
at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.skipRecordPadding(TarArchiveInputStream.java:345)
at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:272)
at de.mem89.masterthesis.rdfHydra.StorageHelper.read(StorageHelper.java:88)
at de.mem89.masterthesis.rdfHydra.StorageHelper.main(StorageHelper.java:124)