0

I was wondering how I can save space writing a bitset to a file ( probably using iostream) in c++. Will breaking up the bitset into bitset of size 8 and then writing each individual bitset to the file save me space? What is your thought about this. This is the intention of data compression.

DogDog
  • 4,820
  • 12
  • 44
  • 66

2 Answers2

0

If you normally write one byte per bit in the bitset, then yes, storing eight elements to a byte will save you 7/8 of the space in the limit (you will have to store the size of the bitset somewhere, of course).

For example, this writes a bitset using one character per bit (7/8 overhead):

for (size_t i=0, n=bs.size(); i<n; ++i)
    stream << bs[i];

while this stores it optimally compact (if we disregard padding at the end):

for (size_t i=0, n=(bs.size() + 1) % 8; i<n; ++i) {
    uint8_t byte=0;
    for (size_t j=0; j<8; ++j)
        byte = (byte << 1) | bs[i*8 + j];
    stream << byte;
}

Note that uint8_t is not standard C++03. It resides in C99's <stdint.h> or C++0x's <cstdint>. You can also use an std::bitset<8> if you want.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • I'm just not sure of the behaviour of the writing if I write a bitset of lets say size 600. – DogDog Mar 02 '11 at 20:51
  • @Apoc: I don't understand what you're afraid of. Could you post some code? – Fred Foo Mar 02 '11 at 20:52
  • @Apoc: You might want to use a `boost::dynamic_bitset` (http://www.boost.org/doc/libs/release/libs/dynamic_bitset/dynamic_bitset.html) instead if your bitsets are very large and can have variable size. – Emile Cormier Mar 02 '11 at 20:57
  • Do you know if using an `ostream_iterator` will also use a byte per bit? What about using an `ostream_iterator` with `vector`? – user470379 Mar 02 '11 at 22:21
  • @user470379: neither will write single bits, since an `ostream` simply won't allow that. C++ streams are fundamentally byte-oriented, as are modern operating systems, file systems, network devices, etc. – Fred Foo Mar 02 '11 at 22:27
0

If you use boost::dynamic_bitset instead, you can specify the type of the underlying blocks and retrieve them with to_block_range and from_block_range functions.

http://www.boost.org/doc/libs/1_46_0/libs/dynamic_bitset/dynamic_bitset.html#to_block_range

(for example, use unsigned char as block type and store them in a stream in binary mode)

Rexxar
  • 1,886
  • 1
  • 14
  • 19