How would I pipe data into bzip2 and get the resulting data from its stdout in C++ on Linux?

Question

I am considering beginning work on a library for Linux that would provide a virtual file system to application developers where the files would be stored in an archive, and each file within the archive would be individually compressed so that retrieval of a single file is a very straightforward task for the developer, for the CPU, and the hard drive. (No complicated API, no need to uncompress gigs of data, just the data that's relevant, and retrieval of only relevant data rather than the whole archive)

I've used popen to retrieve the stdout of a command before using C++ here on Linux, but I don't know how to pipe data in and get data out, and some bzip2 specific tips would be nice. I wrote something similar to this years ago, but it included a huffman compression library as a dll, rather than having to pipe data and use a standard tool. (that was back in my Windows days.)

Boost iostreams has a nice concept called "filters"; you can make an iostream and just throw a bzip2 filter onto it. Check it out, at least the result looks pretty neat. — Kerrek SB, Nov 16 '11 at 01:30
@KerrekSB that sounds pretty intense, I'll have to keep that in mind for future projects. — coder543, Nov 16 '11 at 06:20

score 4 · Accepted Answer · answered Nov 16 '11 at 01:34

4

bzip2 has a library interface -- that will probably be easier for you than invoking a subprocess.

I recommend you also have a look at the GIO library, which is already a "virtual file system for application developers"; it might be a lot less work to extend that to do what you want, than to write a library VFS from scratch.

answered Nov 16 '11 at 01:34

zwol

135,547
38
252
361

Thanks! I'll look into this, and the main reason I'm planning on writing this code like I'm planning on writing is just for the mental exercise. The scope of this project does not include figuring out how to compress stuff, so I wanted to reuse bzip2, but the VFS and API.. those are the fun parts of this project. Now, let's see what I can figure out about this bzip2 library interface. – coder543 Nov 16 '11 at 01:43

score 2 · Answer 2 · answered Nov 16 '11 at 19:41

Have a look at Boost IOStreams

As an example I created the following file from the command line:

$ echo "this is the first line" > file
$ echo "this is the second line" >> file
$ echo "this is the third line" >> file
$ bzip2 file 
$ file file.bz2 
file.bz2: bzip2 compressed data, block size = 900k

I then used a boost::iostreams::filtering_istream to read the results of the decomressed bzip2 file named file.bz2.

#include <boost/iostreams/device/file.hpp>
#include <boost/iostreams/filter/bzip2.hpp>
#include <boost/iostreams/filtering_stream.hpp>
#include <iostream>

namespace io = boost::iostreams;

/* To Compile:
g++ -Wall -o ./bzipIOStream ./bzipIOStream.cpp -lboost_iostreams
*/

int main(){

    io::filtering_istream in;
    in.push(io::bzip2_decompressor());
    in.push(io::file_source("./file.bz2"));

    while(in.good()){
        char c = in.get();
        if(in.good()){
            std::cout << c;
        }
    }

    return 0;
}

The result of of running the command is the decompressed data.

$ ./bzipIOStream 
this is the first line
this is the second line
this is the third line

You don't have read the data character by character of course but I was trying to keep the example simple.

How would I pipe data into bzip2 and get the resulting data from its stdout in C++ on Linux?

2 Answers2