0

I'm using the apache commons compress example from another post on here to extract files from a tar but it is failing with:

java.io.IOException: Invalid file path. 

This only happens with SOME of the vmware ova files I'm passing to it (which are tar files btw) but not with all the ova files; others work fine.

Here is the code:

public static void unTar(File tarFile, File dest) throws IOException {
    dest.mkdir();
    TarArchiveInputStream tarIn = null;

    tarIn = new TarArchiveInputStream(
                new BufferedInputStream(
                        new FileInputStream(
                                tarFile
                        )
                )
    );

    TarArchiveEntry tarEntry = tarIn.getNextTarEntry();
    // tarIn is a TarArchiveInputStream
    while (tarEntry != null) {// create a file with the same name as the tarEntry
        File destPath = new File(dest, tarEntry.getName());
        System.out.println("working: " + destPath.getCanonicalPath());
        if (tarEntry.isDirectory()) {
            destPath.mkdirs();
        } else {
            destPath.createNewFile();
            byte [] btoRead = new byte[1024];
            BufferedOutputStream bout =
                    new BufferedOutputStream(new FileOutputStream(destPath));
            int len = 0;

            while((len = tarIn.read(btoRead)) != -1)
            {
                bout.write(btoRead,0,len);
            }

            bout.close();
            btoRead = null;
        }
        tarEntry = tarIn.getNextTarEntry();
    }
    tarIn.close();
}

It looks like the problem is introduced at tarEntry.getName() at the point where it tries to set the value of destFile. From stepping through it with the debugger, destPath is picking up extra undisplayable characters plus the word "someone" in the path:

target/mybuildname-SNAPSHOT/extractedDirectory/<garbage characters>someone/test.ovf

For ova files that I can successfully untar, the value of desPath looks normal:

target/mybuildname-SNAPSHOT/extractedDirectory/test.ovf

The "someone" text is a decent clue since I see this text in both tar (ova) file headers when I view it with hexdump -C. However they're not in the same locations.

I sense the solution here has something to do with figuring out what the offset is where the filename is stored and reading from that specific offset. That's my best guess but I'm not very good with reading hex.

It's important to note that my goal is to read the ovf xml file inside the ova and that I don't control the creation of the ova's...so I can't fix the problem in the header beforehand. The ova files themselves are perfectly functional and I can also successfully untar them from the command line with tar -xvf test.ova. In fact if I re package the tar file from the command line, the above code will work.

Damon
  • 305
  • 1
  • 5
  • 13
  • you should try their user list https://commons.apache.org/mail-lists.html – Leo Jan 14 '16 at 18:34
  • Thanks Leo. I mailed them just now. Appreciate the suggestion. If they solve it on the mailer I'll post the solution here. – Damon Jan 14 '16 at 18:43
  • Great. This sounds like a bug in the library actually. You can still download the library source and tweak some classes (so you can solve your problem, maybe you can even contribute to the project) – Leo Jan 14 '16 at 18:51
  • So I have a workaround for achieving what I need but I wouldn't quite say it qualifies as a real solution. If you're goal is to extract the ovf then you can essentially work around this by cutting out the extra characters. Adding this line: String filename = new File(tarEntry.getName()).getName(); before File destPath = new File(dest, tarEntry.getName()); will avoid getting an exception when calling destPath.creatNewFile(). However that doesn't really help with use cases where there are nested directories inside the tar. – Damon Jan 15 '16 at 02:01
  • Haven't seen your mail, yet, @Damon , likely stuck in the moderation queue. It may be better to open a JIRA ticket https://issues.apache.org/jira/browse/COMPRESS anyway, since this looks like a bug - or maybe the OVA files using a dialect of tar not fully supported by CC. Be prepared to get asked for a sample that exhibits the problem. – Stefan Bodewig Jan 15 '16 at 12:56
  • @Damon I can't find your email or issue, but I'm having the same problem as mentioned here (also with OVA files). Did you ever find a solution? I'm on 1.11 so I'm going to update to 1.12 and see if it's fixed there, the changelog shows a couple changes to the Tar parsing so I'm hopeful. – Ricket Oct 27 '16 at 23:59
  • It is not fixed as of version 1.12 which is the latest version. – Ricket Oct 28 '16 at 00:26

0 Answers0