0

How can I quickly convert a string of ones and zeroes separated by spaces into a bitset?

There exists a constructor to initialize a bitset from a string not separated by spaces, one to initialize a bitset to all zeroes or ones, and one to initialize from an integer. Off the top of my head, I can think of three ways:

  • Removing the spaces from the string and passing it to the constructor
  • Converting the binary into an integer and passing it to the constructor
  • Initializing all values to zero and changing the value of each bit according to the string in a for-loop

The number of bits is 24, and each string has exactly 24 bits, no more, no less.

EDIT: Here's the code I use to test performance, and here's the code for methods one and two. On my machine, method 1 takes 3 ms, and method 3 takes 14 ms.

EDIT 2: I use -O3 -o -g --std=c++11 are my compiler settings. I used gcc and clang.

noɥʇʎԀʎzɐɹƆ
  • 9,967
  • 2
  • 50
  • 67
  • 6
    I'd go with the 1st option. – πάντα ῥεῖ Jun 30 '16 at 18:45
  • I'd just remove all the spaces, go with option 1. – Jesper Juhl Jun 30 '16 at 18:56
  • I'm still looking for methods other than the mentioned three. In my testing, method 1 is 4.67x faster than method 3 – noɥʇʎԀʎzɐɹƆ Jun 30 '16 at 20:02
  • @uoɥʇʎPʎzɐɹC You're testing an optimized, release build, right? And hopefully you're removing the spaces by `yourstring.erase(std::remove_if(yourstring.begin(), yourstring.end(), ::isspace), yourstring.end());` – PaulMcKenzie Jun 30 '16 at 20:09
  • @PaulMcKenzie Optimized debug build, yes I'm using that method – noɥʇʎԀʎzɐɹƆ Jun 30 '16 at 20:43
  • @uoɥʇʎPʎzɐɹC What is an "optimized debug build"? What are the exact compiler options you're using when you built your application? – PaulMcKenzie Jun 30 '16 at 21:24
  • @PaulMcKenzie Update: on a 100% production build (gcc and clang somehow) I get 17.33% faster for method 1 – noɥʇʎԀʎzɐɹƆ Jun 30 '16 at 21:34
  • @uoɥʇʎPʎzɐɹC I'm still looking for the compiler options you used. You should have `-O2` or `-O3` as a command-line option when building. Otherwise, you're running an unoptimized build, and those times you're giving us are meaningless. Please don't let this post devolve into one where we find out "yeah, now my times are much faster once I use those options". A lot of persons helping on SO wind up wasting their time when things like this happen. – PaulMcKenzie Jun 30 '16 at 21:35
  • @PaulMcKenzie `-O3 -o -g --std=c++11` – noɥʇʎԀʎzɐɹƆ Jun 30 '16 at 21:37
  • ok. That should always accompany a post when it comes to timing and performance. – PaulMcKenzie Jun 30 '16 at 21:38
  • @πάνταῥεῖ Option 1 is slower than Dieter's implementation. – noɥʇʎԀʎzɐɹƆ Jun 30 '16 at 21:44
  • Voted to close as too broad, because there are an unbounded number of different answers. Mostly optimization of a computation involves using pre-established knowledge and constraints. The variations are endless (e.g. they include a table of character pairs with resulting bits). – Cheers and hth. - Alf Jul 01 '16 at 00:02

1 Answers1

1

A conversion (not mutating the input string) to unsigned integer by setting each bit accordingly:

#include <bitset>
constexpr unsigned long long
extract_bits(const char* ptr, unsigned long long accumulator) {
    return (*ptr == 0)
        ? accumulator
        : extract_bits(ptr + 1, (*ptr == '1')
            ? accumulator << 1u | 1u
            : (*ptr == '0')
                ? accumulator << 1
                : accumulator);
}

template <unsigned N>
constexpr std::bitset<N>
to_bitset(const char* ptr) {
    return std::bitset<N>(extract_bits(ptr, 0));
}

#include <iostream>
int main()
{
    constexpr auto b = to_bitset<24>("0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0");
    std::cout << b << '\n';
    return 0;
}

Note: The conversion ignores any character besides '0' and '1' quietly (A string like "01-01" is valid, too).

Getting timings for above conversion and erasing spaces from a string with:

#include <algorithm>
#include <cctype>
#include <cstring>
#include <chrono>
#include <iostream>
#include <random>

using namespace std::chrono;

void print_duration(const char* what, const system_clock::time_point& start, const system_clock::time_point& stop) {
    auto duration = duration_cast<microseconds>(stop - start);
    std::cout << what << ": " << duration.count() << std::endl;
}

volatile unsigned long long result;
int main()
{
    std::string str = "0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0";
    std::vector<std::string> strings(1000, str);
    std::random_device random_device;
    std::mt19937 random_generator(random_device());
    for(auto& str : strings) {
        std::shuffle(str.begin(), str.end(), random_generator);
    }

    // Non mutating to_bitset
    {
        auto start = system_clock::now();
        for(const auto& str : strings) {
            auto b = to_bitset<24>(str.c_str());
            result = b.to_ullong();
        }
        auto stop = system_clock::now();
        print_duration("to_bitset", start, stop);
    }
    // Erasing spaces
    {
        auto start = system_clock::now();
        for(auto& str : strings) {
            str.erase(std::remove_if(str.begin(), str.end(), ::isspace), str.end());
            auto b = std::bitset<24>(str);
            result = b.to_ullong();
        }
        auto stop = system_clock::now();
        print_duration("str.erase", start, stop);
    }
    return 0;
}

g++ 4.8.4 with g++ -std=c++11 -O3 shows:

to_bitset is about 3 times faster than erasing spaces from a string/constructing a bitset.