9

I have several (27) huge (several GB each) bz2 archive files that I need combined into one bz2 archive. Uncompressing them and then creating a new archive from what was just uncompressed is not an option for me, since compressed, all 27 files add up to about 100GB, and uncompressed it's about 5-6TB (yes that's TERAbytes lol).

Can this be done with some sort of script, or is there even another compression format that allows for this to be done (easier)?

dmn
  • 965
  • 3
  • 13
  • 24

4 Answers4

36

You can simply concatenate many bz2 files into single bz2 file, like that:

$ cat file1.bz2 file2.bz2 file3.bz2 >resulting_file.bz2

bzip2 and other utilities like lbzip2 will be able to decompress the resulting file as expected.

  • 2
    In fact, it works! From man bzip2: `bunzip2 will correctly decompress a file which is the concatenation of two or more compressed files. The result is the concatenation of the corresponding uncompressed files. Integrity testing (-t) of concatenated compressed files is also supported.` – ventura10 Aug 18 '14 at 10:00
5

If you're willing to burn a few days of CPU, here's one solution with the magical pipe facility of modern UNIX(R) operating systems:

bzip2 -dc file*.bz2 | bzip2 >resulting_file.bz2

... actually, grab lbzip2 version 2.0, and do the same, except with lbzip2, on a multicore:

lbzip2 -dc file*.bz2 | lbzip2 >resulting_file.bz2
lacos
  • 66
  • 1
  • 3
3

You should flip the question around - you should not try to decompress and then recompress the files, simply make a tar archive of all the separate files - tar is ideal as a container for the separate files.

tar cf tarofbzfiles.tar *.bz2
Anya Shenanigans
  • 91,618
  • 3
  • 107
  • 122
  • Actually I really do need one archive containing all the contents of the 27 archives, lol. I'm pretty sure one archive containing 27 archives won't work for my purpose, unfortunately. :( – dmn Aug 04 '11 at 18:50
  • What type of file is the .bz2 archive? if it's a tar file, then it's possible to concatenate them. It would require a script chaining the uncompression of each of the archives into their own pipe/fifo that is used in a set of tar -A commands, which all conclude into a fifo that is piped through bzip2 – Anya Shenanigans Aug 04 '11 at 20:00
  • I *think* each file is one single (huge) XML file. I'm currently trying a command called bzcat like this: `bzcat *.bz2 > newfile.bz2`. I started it over an hour ago so we'll see how it goes...much later. :) – dmn Aug 04 '11 at 21:01
  • 2
    bzcat *.bz2 | bzip2 -c > newfile.bz2 - if you don't re-bzip2 the file you won't get the compression!! – Anya Shenanigans Aug 04 '11 at 21:39
2

You can shorten @lacos's answer with the built in bzcat shorthand for bzip2 -dc and pipe back into bzip2 as usual. Not any more correct than @lacos but a little bit slicker ;)

bzcat file*.bz2 | bzip2 >resulting_file.bz2
tannermares
  • 153
  • 1
  • 8