1

I am working with a project which will use bitset. As the text file provided is very large(>800M), to load it directly to std::bitset will cost more then 25 seconds. So I want to preprocess the text file to a memory dumped binary file. Because a char with 8 bit will covert to 1 bit, the cost time of file load will reduce much. I write a demo code:

#include <iostream>      
#include <bitset>         
#include <string>
#include <stdexcept>      
#include <fstream>
#include <math.h> 

int main () {
    const int MAX_SIZE = 19;
    try {

        std::string line = "1001111010011101011";
        int copy_bypes = (int)ceil((float)MAX_SIZE / 8.0);


        std::bitset<MAX_SIZE>* foo = new (std::nothrow)std::bitset<MAX_SIZE>(line);     // foo: 0000
        std::ofstream os ("data.dat", std::ios::binary);
        os.write((const char*)&foo, copy_bypes);
        os.close();


        std::bitset<MAX_SIZE>* foo2 = new (std::nothrow)std::bitset<MAX_SIZE>();
        std::ifstream input("data.dat",std::ios::binary);
        input.read((char*)&foo2, copy_bypes);
        input.close();

        for (int i = foo2->size() -1 ; i >=0 ; --i) {
            std::cout  << (*foo2)[i];
        }
        std::cout <<std::endl;
    }
    catch (const std::invalid_argument& ia) {
        std::cerr << "Invalid argument: " << ia.what() << '\n';
    }
    return 0;
}

it seems work fine, but I am worried this usage can really work fine in production enviroment.

Thanks in some advanced.

buld0zzr
  • 962
  • 5
  • 10
  • You may want to give a look to http://stackoverflow.com/questions/5251403/binary-serialization-of-stdbitset – Holt Jul 19 '16 at 09:04
  • 1
    `std::bitset* foo = new (std::nothrow)std::bitset(line);` -- Why are you dynamically allocating here? And I don't see how this works "fine" when you give the address of the pointer (`&foo`) to `os.write`. – PaulMcKenzie Jul 19 '16 at 09:49
  • *As the text file provided is very large(>800M), to load it directly to std::bitset will cost more then 25 seconds.* -- You have a very poor disk system if that's the case. – PaulMcKenzie Jul 19 '16 at 09:52

2 Answers2

0

Writing binary non-trival class to file is really dangerous. You should convert bitset to well-defined binary data. If you know that your data will fit in unsigned long long, you could use bitset<>::to_ullong() and write/read that unsigned long long. If you wanna this to be cross platform beetwet e.g. 64 and 32 bit platform, you should use fixed size types.

stryku
  • 722
  • 5
  • 12
0

These two lines are wrong

os.write((const char*)&foo, copy_bypes);
input.read((char*)&foo2, copy_bypes);

You're passing the address of pointer to foo2, not the std::bitset object itself. But even if it's corrected:

os.write((const char*)foo, copy_bypes);
input.read((char*)foo2, copy_bypes);

it would be unsafe to use in production environment. Here you're assuming that std::bitset is a PODtype and access it as such. However, when your code would become more complex, you're risking of writing or reading too much, and there're no safeguards to stop undefined behavior from happening. std::bitset was made to be convenient, not fast, and it is expressed through the methods it provides to access bits - there's no proper way of obtaining the address of its storage, as, for example, std::vector or std::string provide. If you need performance, you'll need to do your own implementation.

buld0zzr
  • 962
  • 5
  • 10