2

I'm trying to compress some data using boost gzip compression via filtering_streambuf. The compressed version is then written to disc. The problem is the data is over 10GB in size and I believe stringstream is running out of space. Assuming I can break this data up into pieces, what's the right way of using stringstream and filtering_streambuf to compress all my data?

I've tried breaking up the data into pieces where I set the max chunk size to std::string::max_size()/2 and pushing several stringstream objects to the filtering_streambuf object but that doesn't seem to be how filtering_streambuf works :) I've also tried copying each chunk of data using bio::copy() repeatedly. I've attached a sample code that isn't my exact code (don't have access to it atm) but the idea is the same except compressed is a filestream. It's possible something I mentioned actually works and I just have a bug in my code but if that's the case then I'll find the bug. Just need to know what's considered the correct approach for compressing a large chunk of data.

EDIT: Added actual code I've written. For some reason, this doesn't compile because write is not a valid function? Also, can't declare filtering_ostream either. Maybe this version of boost is old? The variables being written are chars.

boost::iostreams::filtering_streambuf<boost::iostreams::output> out;
out.push(boost::iostreams::gzip_compressor());
out.push(boost::iostreams::file_sink(fileName.c_str()));

out.write(&sizeof_sizet, 1);
out.write(&sizeof_int, 1);
out.write(&sizeof_double, 1);
out.write(&sizeof_Int, 1);

EDIT 2: This might be what I'm trying to achieve. Compiles but didn't test yet.

boost::iostreams::filtering_ostreambuf buf;
buf.push(boost::iostreams::gzip_compressor());
buf.push(boost::iostreams::file_sink(fileName.c_str()));

std::ostream out(&buf);

out.write(&sizeof_sizet, 1);
out.write(&sizeof_int, 1);
out.write(&sizeof_double, 1);
out.write(&sizeof_Int, 1);
Dan Mašek
  • 17,852
  • 6
  • 57
  • 85
hasuchobe
  • 21
  • 3
  • Have you checked to see if you are running out of ram and getting disk swaps? Also, some compression algos may stress cache limits. – doug Sep 12 '19 at 02:41
  • "The compressed version is then written to disc" -- so, why don't you just compress into a filestream directly? – Dan Mašek Sep 12 '19 at 12:12
  • @DanMašek Hmm... didn't know you could do that. Let me tinker around and see what happens. – hasuchobe Sep 12 '19 at 17:09
  • Instead of `filtering_streambuf` use `filtering_stream`. The rest same as in the first code example you posted. Like [here](https://www.boost.org/doc/libs/1_55_0/libs/iostreams/doc/tutorial/filter_usage.html). – Dan Mašek Sep 12 '19 at 22:31

1 Answers1

2

Use a filtering_stream instead of filtering_streambuf and write directly to a file to avoid having to buffer the entire compressed result in memory until completion.

#include <boost/iostreams/device/file.hpp>
#include <boost/iostreams/filtering_stream.hpp>

#include <boost/iostreams/filter/gzip.hpp>

int main()
{
    boost::iostreams::filtering_ostream out;
    out.push(boost::iostreams::gzip_compressor());
    out.push(boost::iostreams::file_sink("test.gz"));

    std::string test_string("FOO BAR BAZ....\n");

    out.write(test_string.c_str(), test_string.size() + 1);
}

I can run it, and then try to decompress the file it created:

>ls test.gz
ls: test.gz: No such file or directory

>test.exe

>ls test.gz
test.gz

>gzip -cd test.gz
FOO BAR BAZ....
Dan Mašek
  • 17,852
  • 6
  • 57
  • 85