5

I have been using boost::iostreams for reading an uncompressed text file. In my application, I need multiple file handles (stored in a map) to efficiently buffer data for different parameters stored in this file. Also, if I read a line and it is for a parameter to be used at a time later than I am currently interested in, I restore the stream position (recovered via tellg()) to where it was before I called getline()), so I can still buffer this value at a future time.

However, I now wish to read gzip compressed files, but otherwise perform identical operations as before. I have run across the following issues (discovered before, but the solution does not seem to work with my triplet of requirements).

A short test main() that reproduces these issues follows:

#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <boost/shared_ptr.hpp>// shared_ptr
#include <boost/iostreams/filtering_stream.hpp>// filtering_[io]stream
#include <boost/iostreams/filter/gzip.hpp>// gzip

//-----------------------------------------------------------------------------

int main( int argc, char** argv ){

    std::cout << std::scientific << std::setprecision(15) << std::endl;
    std::string fileName("test.txt.gz");
    bool gzipped(true);
    std::string line;

    // TEST 1
    boost::shared_ptr<std::ifstream> fileStream1;
    boost::shared_ptr<boost::iostreams::filtering_istream> fileFilter1;
    fileFilter1.reset( new boost::iostreams::filtering_istream );
    fileStream1.reset( new std::ifstream( fileName.c_str() ) );
    if(gzipped)
        fileFilter1->push( boost::iostreams::gzip_decompressor() );
    fileFilter1->push( *fileStream1 );
    while( std::getline( *fileFilter1, line ) ){
        //std::streampos strPos( fileFilter1->tellg() );// uncomment this line for run-time errors
        std::cout << line << std::endl;
    }
    std::cout << std::endl;

    // TEST 2
    boost::shared_ptr<std::ifstream> fileStream2;
    boost::shared_ptr<boost::iostreams::filtering_stream<boost::iostreams::input_seekable> > fileFilter2;
    fileFilter2.reset( new boost::iostreams::filtering_stream<boost::iostreams::input_seekable>() );
    fileStream2.reset( new std::ifstream( fileName.c_str() ) );
    //fileFilter2->push( boost::iostreams::gzip_decompressor() );// uncomment this line for compile-time errors
    fileFilter2->push( *fileStream2 );
    while( std::getline( *fileFilter2, line ) ){
        std::streampos strPos( fileFilter2->tellg() );
        std::cout << line << std::endl;
    }
    std::cout << std::endl;

    return 0;
}

The input file in this case can obviously contain whatever you like. Just make sure it has >1 lines of text to see the tellg() issue.

In TEST1, I can only imagine that the errors are caused by tellg()'s sentry creation and failbit modification (http://www.cplusplus.com/reference/istream/istream/sentry/), as suggested by this post: https://svn.boost.org/trac/boost/ticket/2449).

boost::iostreams::filtering_stream<boost::iostreams::input_seekable>

will overcome the tellg() issue. However, TEST2 shows that I am unable to push a decompressor object onto this type of filter. I have not found a workaround for this.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • 3
    According to a quick search on google, you can't do random access in a gzip file, unless you build an index first. [This](http://stackoverflow.com/questions/14225751/random-access-to-gzipped-files) has a bit more information, and links to people doing it (especially the zran.c file). Looks involved though, I'm not sure if boost can do it for you. – dgel Aug 08 '13 at 16:23

0 Answers0