14

I'm currently trying to implement various algorithms in a Just In Time (JIT) compiler. Many of the algorithms operate on bitmaps, more commonly known as bitsets.

In C++ there are various ways of implementing a bitset. As a true C++ developer, I would prefer to use something from the STL. The most important aspect is performance. I don't necessarily need a dynamically resizable bitset.

As I see it, there are three possible options.

I. One option would be to use std::vector<bool>, which has been optimized for space. This would also indicate that the data doesn't have to be contiguous in memory. I guess this could decrease performance. On the other hand, having one bit for each bool value could improve speed since it's very cache friendly.

II. Another option would be to instead use a std::vector<char>. It guarantees that the data is contiguous in memory and it's easier to access individual elements. However, it feels strange to use this option since it's not intended to be a bitset.

III. The third option would be to use the actual std::bitset. That fact that it's not dynamically resizable doesn't matter.

Which one should I choose for maximum performance?

Man of One Way
  • 3,904
  • 1
  • 26
  • 41
  • 4
    Benchmark! [Related.](http://www.youtube.com/watch?v=wg4trPZFUwc) –  Jul 29 '12 at 20:01
  • 3
    There is also [Boost.Dynamic Bitset](http://www.boost.org/doc/libs/1_50_0/libs/dynamic_bitset/dynamic_bitset.html) to consider. But seriously there is really no way to tell which performance has best performance without knowing the usage pattern. For example: If your collection is small and often accessed `vector` might give you faster access then the bitsets, due to not having to do bitshifting/masking. However when less often accessed/bigger the higher amount of cache misses due to the bigger memory footprint might very will kill that benefit. – Grizzly Jul 29 '12 at 20:18
  • At the risk of pointing out something possibly obvious: the std::bitset is allocated on the stack and is thus pretty limited in maximum size in most cases. I don't know anything about the amount of data you need to store, however. – identity Jul 29 '12 at 20:25
  • How big does it need to be? I mean, can you just fit it in an unsigned long long or something like that? – harold Jul 29 '12 at 22:58

3 Answers3

8

Best way is to just benchmark it, because every situation is different.

I wouldn't use std::vector<bool>. I tried it once and the performance was horrible. I could improve the performance of my application by simply using std::vector<char> instead.

I didn't really compare std::bitset with std::vector<char>, but if space is not a problem in your case, I would go for std::vector<char>. It uses 8 times more space than a bitset, but since it doesn't have to do bit-operations to get or set the data, it should be faster.

Of course if you need to store lots of data in the bitset/vector, then it could be beneficial to use bitset, because that would fit in the cache of the processor.

The easiest way is to use a typedef, and to hide the implementation. Both bitset and vector support the [] operator, so it should be easy to switch one implementation by the other.

Patrick
  • 23,217
  • 12
  • 67
  • 130
  • The `operator[]` are similar enough yes, but the constructors aren't. – Mooing Duck Jul 22 '14 at 18:41
  • @MooingDuck: True. I use typedef's to simplify migration from one type to another, but not to make it effortless. I also use typedef's for collections so I can hide the real implementation (list, vector, deque, ...), which reduces the real code changes with about 90% if I ever change container type. – Patrick Jul 24 '14 at 12:31
5

I answered a similar question recently in this forum. I recommend my BITSCAN library. I have just released version 1.0. BITSCAN is specifically designed for fast bit scanning operations.

A BitBoard class wraps a number of different implementations for typical operations such as bsf, bsr or popcount for 64-bit words (aka bitboards). Classes BitBoardN, BBIntrin and BBSentinel extend bit scanning to bit strings. A bit string in BITSCAN is an array of bitboards. The base wrapper class for a bit string is BitBoardN. BBIntrin extends BitBoardN by using Windows compiler intrinsics over 64 bitboards. BBIntrin is made portable to POSIX by using the appropriate asm equivalent functions.

I have used BITSCAN to implement a number of efficient solvers for NP combinatorial problems in the graph domain. Typically the adjacency matrix of the graph as well as vertex sets are encoded as bit strings and typical computations are performed using bit masks. Code for simple bitencoded graph objects is available in GRAPH. Examples of how to use BITSCAN and GRAPH are also available.

A comparison between BITSCAN and typical implementations in STL (bitset) and BOOST (dynamic_bitset) can be found here: http://blog.biicode.com/bitscan-efficiency-at-glance/

chesslover
  • 347
  • 2
  • 6
1

You might also be interested in this (somewhat dated) paper: http://www.cs.up.ac.za/cs/vpieterse/pub/PieterseEtAl_SAICSIT2010.pdf

[Update] The previous link seems to be broken, but I think it was pointing to this article: https://www.researchgate.net/publication/220803585_Performance_of_C_bit-vector_implementations

rturrado
  • 7,699
  • 6
  • 42
  • 62
D. A.
  • 3,369
  • 3
  • 31
  • 34
  • Shortly, here is the conclusion of the paper: "We have shown that `boost::dynamic_bitset` is considerably more efficient than most of the other implementations in terms of execution speed, while the implementation using `std::vector` outperformed the other implementations in terms of memory efficiency." – davidhigh Jul 27 '14 at 21:06