1

I want to save the contents of a tar.gz archive inside a database table.

The archive contains txt files in CSV format.

The idea is to insert a new line in the database for each line in the txt files.

The problem is that I can't read the contents of a file separately then move on to the next file.

Below EntryTable and EntryTableLine are Hibernate entities.

EntryTable is in a OneToMany relationship with EntryTableLine (a file -EntryTable- can have many lines -EntryTableLine-).

public static final int TAB = 9;

FileInputStream fileInputStream = new FileInputStream(fileLocation);
GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
TarArchiveInputStream tar = new TarArchiveInputStream(gzipInputStream);

BufferedReader reader = new BufferedReader(new InputStreamReader(tar));
// Columns are delimited with TAB
CSVFormat csvFormat = CSVFormat.TDF.withHeader().withDelimeter((char) TAB);
CSVParser parser = new CSVParser(reader, csvFormat);

TarArchiveEntry tarEntry = tar.getNextTarEntry();

while(tarEntry != null){
  EntryTable entryTable = new EntryTable();
  entryTable.setFilename(tarEntry.getName());

  if(reader != null){

     // Here is the problem
     for(CSVRecord record : parser){
        //this could have been a StringBuffer
        String line;
        int i = 1;
        for(String val : record){
           line = "<column" + i + ">" + val + "</column" + i + ">";
        }

        EntryTableLine entryTableLine = new EntryTableLine();
        entryTableLine.setContent(line);
        entryDao.saveLine(entryTableLine);
      }
  }
  tarEntry = tar.getNextTarEntry();
}

I tried converting tarEntry.getFile() to InputStream, but tarEntry.getFile() is null unfortunately.

Let's say I have 4 files in the archive. Each file has 3 lines inside. However, in the database, some entries have 5 lines while others have none.

Thank you !

alepuzio
  • 1,382
  • 2
  • 28
  • 38
happy songs
  • 835
  • 8
  • 21
  • I believe you need to read from the TarArchiveInputStream after each call to getNextTarEntry. – VGR Apr 01 '19 at 13:52
  • 1
    As the documentation of [TarArchiveEntry.getFile()](https://commons.apache.org/proper/commons-compress/javadocs/api-1.18/org/apache/commons/compress/archivers/tar/TarArchiveEntry.html#getFile--) states: "_This method is only useful for entries created from a File but not for entries read from an archive._". The documtation's example page contains some code [how to read a TAR archive](https://commons.apache.org/proper/commons-compress/examples.html#tar). – vanje Apr 01 '19 at 14:05
  • I was not reading the InputStream correctly. I managed to read the content of each file after doing something similar to the example "how to read a TAR archive". Thanks :D – happy songs Apr 03 '19 at 14:38

3 Answers3

0

You can use the TarArchiveInputStream of Apache Commons Compress as shown below(Reference):

TarArchiveInputStream input = new TarArchiveInputStream(new GzipCompressorInputStream(new FileInputStream("C:\\Users\\User\\Desktop\\Books\\test\\CoverLetter-Version2.gz")));
TarArchiveEntry entry = input.getNextTarEntry();
System.out.println(entry.getName()); // prints the name of file inside the tar
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
while (entry != null) {
    br = new BufferedReader(new InputStreamReader(input)); // Read directly from tarInput
    System.out.println("For File = " + currentEntry.getName());
    String line;
    while ((line = br.readLine()) != null) {
          System.out.println("line="+line);
    }
     entry = input.getNextTarEntry(); 
}
Yug Singh
  • 3,112
  • 5
  • 27
  • 52
0

Try to read directly from inputstream:

        BufferedReader br = null;
        while(tarEntry != null){
            br = new BufferedReader(new InputStreamReader(tarEntry));
pethryth
  • 56
  • 5
0

Doing something similar to this solved the problem:

TarArchiveEntry entry = tarInput.getNextTarEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
    tarInput.read(content, offset, content.length - offset);
}

Reference mentioned in the comments

happy songs
  • 835
  • 8
  • 21