0

I am trying to extract a bz2 file as mentioned below, this is a test class that I wrote and I know that it is .txt file when uncompressed, but when I actually read it from the server, the uncompressed bz2 file can be anything like html, tar,tgz or text files, how would I be able to make this code generic such that it will work for any kind of file.

I want to uncompress different files, if it is test.txt.bz2, then uncompress to test.txt and 6223.webvis.html_20130803195241.bz2 to 6223.webvis.html_20130803195241. How can I make my code generic such that it will work for these two different scenarios.

try{
FileInputStream fin = new FileInputStream("C:\\temp\\test.txt.bz2");
BufferedInputStream in = new BufferedInputStream(fin);
FileOutputStream out = new FileOutputStream("C:\\temp\\test.txt");
BZip2CompressorInputStream bzIn = new BZip2CompressorInputStream(in);
int buffersize = 1024;
final byte[] buffer = new byte[buffersize];
int n = 0;
while (-1 != (n = bzIn.read(buffer))) {
out.write(buffer, 0, n);
}
out.close();
bzIn.close();
}
catch (Exception e) {
throw new Error(e.getMessage());
}
}

Thanks, Akshitha.

Akshitha
  • 89
  • 1
  • 2
  • 7

2 Answers2

1

The normal pattern is a file with name x gets saved as x.bz2, so the output file name is the input file name with the last four characters removed. The only known exception is x.tar -> x.tbz (but some people use x.tar.bz2).

This means your example doesn't follow the normal pattern; otherwise it would be test.txt.bz2.

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • I mentioned it incorrect, I have files as below test.txt which is a bz2 archive i.e test.txt.bz2 and 123.test.html_2013.bz2 , which are different files. What should I mention as an argument to be passed in the output stream? – Akshitha Aug 18 '14 at 16:10
  • Since I do not believe that the uncompressed filename is 123.test.html_2013, I think you are doing it wrong and therefore have the impossible question to ask. – Joshua Aug 18 '14 at 16:16
  • This is my actual file name 6223.webvis.html_20130803195241.bz2 that I was given – Akshitha Aug 18 '14 at 16:36
  • Talk to the guy who is giving you these file names. – Joshua Aug 18 '14 at 16:42
  • I did and he mentioned that there are webvis.html files – Akshitha Aug 18 '14 at 16:55
  • Ok you have a local naming convention webvis.html -> garbage.webvis.html_datetime.bz2 and you asked on stackoverflow what the original name is. I would suggest you reconsider the very thing you are trying to do. You certainly can't unpack them all as webvis.html. – Joshua Aug 18 '14 at 17:09
  • Sorry I didn't quite follow, are you meaning we cannot uncompress usch files? – Akshitha Aug 18 '14 at 17:33
  • You cannot uncompress two files to the same file name for the obvious reason. – Joshua Aug 18 '14 at 17:36
  • I want to uncompress different files, if it is test.txt.bz2, then uncompress to test.txt and 6223.webvis.html_20130803195241.bz2 to 6223.webvis.html_20130803195241. How can I make my code generic such that it will work for these two different scenarios. – Akshitha Aug 18 '14 at 17:52
  • By removing the last four characters. – Joshua Aug 18 '14 at 17:53
1

A BZ2 archive does not know anything about the original name. The usual way to do it is to compress file.ext as file.ext.bz2, so you get the output file name from the archive name.

String inFile = "test.bz2";
String outFile = inFile.substring(0, inFile.length() - 4);
// outFile == "test"
Gabriel Negut
  • 13,860
  • 4
  • 38
  • 45