I noticed that destructors of binary_oarchive
and filtering_streambuf
are extremely slow and I don't know why. I'm running the following code:
namespace bfs = boost::filesystem;
namespace bar = boost::archive;
namespace bio = boost::iostreams;
using clock = std::chrono::steady_clock;
using time_point = std::chrono::time_point<clock>;
using duration = std::chrono::duration<double>;
// main code:
time_point start, now;
duration diff;
bfs::ofstream file(path, std::ios_base::out | std::ios_base::binary);
{
bio::filtering_streambuf<bio::output> stream;
stream.push(bio::gzip_compressor());
stream.push(file);
{
bar::binary_oarchive oarch(stream);
start = clock::now();
oarch << BIG_OBJECT; // results in around 2GiB of output
now = clock::now();
diff = now - start;
std::cout << "Writing took " << diff.count() << " seconds." << std::endl;
start = now;
} // 'oarch' destructor should run
now = clock::now();
diff = now - start;
std::cout << "'oarch' destructor took " << diff.count() << " seconds." << std::endl;
start = now;
} // 'stream' destructor should run
now = clock::now();
diff = now - start;
std::cout << "'stream' destructor took " << diff.count() << " seconds." << std::endl;
The output that I'm getting is:
Writing took 1709.93 seconds. // around 28 minutes
'oarch' destructor took 2226.82 seconds. // around 37 minutes
'stream' destructor took 2177.07 seconds. // around 36 minutes
I run md5sum
on the output file every 10 seconds to see if anything changed:
- after the writing has finished, the output file was around 2GiB in size,
- while the 'oarch' destructor was running, the contents and the size of the output file didn't change (the md5 hash was the same the entire time),
- after the 'oarch' destructor finished, the file grew only by around 2KiB,
- after the 'stream' destructor finished, the file was the same as when the 'oarch' destructor finished (the same size and md5 hash).
So, why are those destructors so slow, even though barely anything happens to the output file after it has been written to?
A possibly important point: the entire code uses 8GiB of RAM and ~70GiB of swap.