Fastest way to compare bitsets (< operator on bitsets)?

Question

What is the most optimized way to implement a < operator for std::bitset corresponding to the comparison of the unsigned integer representation (it should work for bitsets of more than 64 bits) ?

A trivial implementation would be:

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    for (int i = N-1; i >= 0; i--) {
        if (x[i] && !y[i]) return false;
        if (!x[i] && y[i]) return true;
    }
    return false;
}

When I say "most optimized way" I am looking for implementations using bitwise operations and metaprogramming tricks (and things like that).

EDIT: I think that I've found the trick: template metaprogramming for compile-time recursion and right bitshift in order to compare bitsets as several unsigned long long integers. But no clear idea of how to do that...

About your idea using right bitshift: That'd create a lot of intermediate objects and `to_ullong` will have to check if the shifted values do *fit* into an `unsigned long long` for each check, hence creating quite some overhead. I doubt it'd be faster, although only a benchmark could prove that. — Daniel Frey, Jan 20 '14 at 22:39
Copy the code for std::bitset, rename it, give it a method to access a word at a time. — brian beuning, Jan 20 '14 at 23:04
@brianbeuning If you are copying the code anyways, you can simply provide an `operator<` which has access to the internals. — Daniel Frey, Jan 21 '14 at 09:05
@Vincent I've updated with runtimes below: bit-wise (current most upvotes, block-wise, and template recursive block-wise). — user, Jan 22 '14 at 16:55

score 12 · Accepted Answer · answered Jan 20 '14 at 22:17

12

The obvious optimization would be

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    for (int i = N-1; i >= 0; i--) {
        if (x[i] ^ y[i]) return y[i];
    }
    return false;
}

Other than that, it should be quite impossible to use a more bits-per-test as there is no standard-conforming way to access them. You could benchmark x.to_string() < y.to_string() and hope for both to_string() and string comparison to be optimized better than bitwise access to a bitset, but that's a long shot.

answered Jan 20 '14 at 22:17

Daniel Frey

55,810
13
122
180

@dyp Who knows. It's a question of performance, so in the end you'd have to benchmark it. And it might change with each compiler version. If thinking about "small" bitsets, one could also specialize for <=64 bits by using `to_ullong`, but I guess the spirit of this question is more like a couple hundred bits. – Daniel Frey Jan 20 '14 at 22:21
+1 For a solution its size, it's hard to do better. For template recursion version, see below. – user Jan 22 '14 at 16:48
1

Note that even if `std::bitset` would expose some `.data()` member, the lexicographical ordering from the standard containers and `std::tuple` is hard to optimize using that knowledge. The tempting thing to do would be to do integer comparison on the underlying word representation, but that actually corresponds to *reverse colexicographical* ordering. You could use `std::lexicographical_compare(rbegin(R.data), rend(R.data), rbegin(L.data), rend(L.data))` as `operator<(L, R)`. The "reverse" corresponds to the L/R reversal, and the "co" to the reverse iterators in "reverse colexicographical". – TemplateRex Aug 10 '14 at 11:30

score 5 · Answer 2 · answered Mar 26 '16 at 09:52

If you are willing to adopt the solution if STL bitset changes you may use

template<int n>
bool compare(bitset<n>& l, bitset<n>& r){
  if(n > 64){
  typedef array<long, (n/64)> AsArray;
  return *reinterpret_cast<AsArray*>(&l)
       < *reinterpret_cast<AsArray*>(&r);
    }//else
  return l.to_ulong() < r.to_ulong();
}

the compiler throws the irrelevant branch of the if away

user · Answer 3 · 2014-01-22T16:59:27.600

I just looked at the source code, but unfortunately (unless, hopefully, I am mistaken), they don't seem to give you in-place access to a const & unsigned long for a particular block of bits. If they did, then you could perform template recursion, and effectively compare each unsigned long rather than each bit in an unsigned long.

After all, if A < B, then not only should each of the most significant bits a <= b, also each of the most significant block A[i] <= B[i].

I hate to say it, but I would probably roll my own using recursion on C++11's std::array. If you have access to the blocks, then you can make a template recursive function to do this pretty easily (and as I'm sure you know since you're asking for metaprogramming) give the compiler a great chance to optimize.

All in all, not a great answer, but that's what I would do.

Excellent question, by the way.

===========

EDIT

This should time three approaches: the one with the most current upvotes, the block strategy I described, and a template recursive variant. I fill a vector with bitsets and then sort repeatedly using the specified comparator functor.

Happy hacking!

Output on my computer:

RUNTIMES:
compiled g++ -std=c++11 -Wall -g test.cpp
    std::bitset         4530000 (6000000 original in OP)
    Block-by-block      900000
    Template recursive  730000

compiled g++ -std=c++11 -Wall -g -O3 test.cpp
RUNTIMES:
    std::bitset         700000 (740000 original in OP)
    Block-by-block      470000
    Template recursive  530000

C++11 code:

#include <iostream>
#include <bitset>
#include <algorithm>
#include <time.h>

/* Existing answer. Note that I've flipped the order of bit significance to match my own */
template<std::size_t N>
class BitByBitComparator
{
public:
  bool operator()(const std::bitset<N>& x, const std::bitset<N>& y) const
  {
    for (int i = 0; i < N; ++i) {
      if (x[i] ^ y[i]) return y[i];
    }
    return false;
  }
};

/* New simple bit set class (note: mostly untested). Also note bad
   design: should only allow read access via immutable facade. */
template<std::size_t N>
class SimpleBitSet
{
public:
  static const int BLOCK_SIZE = 64;
  static const int LOG_BLOCK_SIZE = 6;
  static constexpr int NUM_BLOCKS = N >> LOG_BLOCK_SIZE;
  std::array<unsigned long int, NUM_BLOCKS> allBlocks;
  SimpleBitSet()
  {
    allBlocks.fill(0);
  }
  void addItem(int itemIndex)
  {
    // TODO: can do faster
    int blockIndex = itemIndex >> LOG_BLOCK_SIZE;
    unsigned long int & block = allBlocks[blockIndex];
    int indexWithinBlock = itemIndex % BLOCK_SIZE;
    block |= (0x8000000000000000 >> indexWithinBlock);
  }
  bool getItem(int itemIndex) const
  {
    int blockIndex = itemIndex >> LOG_BLOCK_SIZE;
    unsigned long int block = allBlocks[blockIndex];
    int indexWithinBlock = itemIndex % BLOCK_SIZE;
    return bool((block << indexWithinBlock) & 0x8000000000000000);
  }
};

/* New comparator type 1: block-by-block. */
template<std::size_t N>
class BlockByBlockComparator
{
public:
  bool operator()(const SimpleBitSet<N>& x, const SimpleBitSet<N>& y) const
  {
    return ArrayCompare(x.allBlocks, y.allBlocks);
  }

  template <std::size_t S>
  bool ArrayCompare(const std::array<unsigned long int, S> & lhs, const std::array<unsigned long int, S> & rhs) const
  {
    for (int i=0; i<S; ++i)
      {
    unsigned long int lhsBlock = lhs[i];
    unsigned long int rhsBlock = rhs[i];
    if (lhsBlock < rhsBlock) return true;
    if (lhsBlock > rhsBlock) return false;
      }
    return false;
  }
};

/* New comparator type 2: template recursive block-by-block. */
template <std::size_t I, std::size_t S>
class TemplateRecursiveArrayCompare;

template <std::size_t S>
class TemplateRecursiveArrayCompare<S, S>
{
public:
  bool operator()(const std::array<unsigned long int, S> & lhs, const std::array<unsigned long int, S> & rhs) const
  {
    return false;
  }
};

template <std::size_t I, std::size_t S>
class TemplateRecursiveArrayCompare
{
public:
  bool operator()(const std::array<unsigned long int, S> & lhs, const std::array<unsigned long int, S> & rhs) const
  {
    unsigned long int lhsBlock = lhs[I];
    unsigned long int rhsBlock = rhs[I];
    if (lhsBlock < rhsBlock) return true;
    if (lhsBlock > rhsBlock) return false;

    return TemplateRecursiveArrayCompare<I+1, S>()(lhs, rhs);
  }
};

template<std::size_t N>
class TemplateRecursiveBlockByBlockComparator
{
public:
  bool operator()(const SimpleBitSet<N>& x, const SimpleBitSet<N>& y) const
  {
    return TemplateRecursiveArrayCompare<x.NUM_BLOCKS, x.NUM_BLOCKS>()(x.allBlocks, y.allBlocks);
  }
};

/* Construction, timing, and verification code */
int main()
{
  srand(0);

  const int BITSET_SIZE = 4096;

  std::cout << "Constructing..." << std::endl;

  // Fill a vector with random bitsets
  const int NUMBER_TO_PROCESS = 10000;
  const int SAMPLES_TO_FILL = BITSET_SIZE;
  std::vector<std::bitset<BITSET_SIZE> > allBitSets(NUMBER_TO_PROCESS);
  std::vector<SimpleBitSet<BITSET_SIZE> > allSimpleBitSets(NUMBER_TO_PROCESS);
  for (int k=0; k<NUMBER_TO_PROCESS; ++k)
    {
      std::bitset<BITSET_SIZE> bs;
      SimpleBitSet<BITSET_SIZE> homemadeBs;
      for (int j=0; j<SAMPLES_TO_FILL; ++j)
    {
      int indexToAdd = rand()%BITSET_SIZE;
      bs[indexToAdd] = true;
      homemadeBs.addItem(indexToAdd);
    }

      allBitSets[k] = bs;
      allSimpleBitSets[k] = homemadeBs;
    }

  clock_t t1,t2,t3,t4;
  t1=clock();

  std::cout << "Sorting using bit-by-bit compare and std::bitset..."  << std::endl;
  const int NUMBER_REPS = 100;
  for (int rep = 0; rep<NUMBER_REPS; ++rep)
    {
      auto tempCopy = allBitSets;
      std::sort(tempCopy.begin(), tempCopy.end(), BitByBitComparator<BITSET_SIZE>());
    }

  t2=clock();

  std::cout << "Sorting block-by-block using SimpleBitSet..."  << std::endl;
  for (int rep = 0; rep<NUMBER_REPS; ++rep)
    {
      auto tempCopy = allSimpleBitSets;
      std::sort(tempCopy.begin(), tempCopy.end(), BlockByBlockComparator<BITSET_SIZE>());
    }

  t3=clock();

  std::cout << "Sorting block-by-block w/ template recursion using SimpleBitSet..."  << std::endl;
  for (int rep = 0; rep<NUMBER_REPS; ++rep)
    {
      auto tempCopy = allSimpleBitSets;
      std::sort(tempCopy.begin(), tempCopy.end(), TemplateRecursiveBlockByBlockComparator<BITSET_SIZE>());
    }

  t4=clock();

  std::cout << std::endl << "RUNTIMES:" << std::endl;
  std::cout << "\tstd::bitset        \t" << t2-t1 << std::endl;
  std::cout << "\tBlock-by-block     \t" << t3-t2 << std::endl;
  std::cout << "\tTemplate recursive \t" << t4-t3 << std::endl;
  std::cout << std::endl;

  std::cout << "Checking result... ";
  std::sort(allBitSets.begin(), allBitSets.end(), BitByBitComparator<BITSET_SIZE>());
  auto copy = allSimpleBitSets;
  std::sort(allSimpleBitSets.begin(), allSimpleBitSets.end(), BlockByBlockComparator<BITSET_SIZE>());
  std::sort(copy.begin(), copy.end(), TemplateRecursiveBlockByBlockComparator<BITSET_SIZE>());
  for (int k=0; k<NUMBER_TO_PROCESS; ++k)
    {
      auto stdBitSet = allBitSets[k];
      auto blockBitSet = allSimpleBitSets[k];
      auto tempRecBlockBitSet = allSimpleBitSets[k];

      for (int j=0; j<BITSET_SIZE; ++j)
    if (stdBitSet[j] != blockBitSet.getItem(j) || blockBitSet.getItem(j) != tempRecBlockBitSet.getItem(j))
      std::cerr << "error: sorted order does not match" << std::endl;
    }
  std::cout << "success" << std::endl;

  return 0;
}

Compiled with `-O3` with a recent gcc, the second option is fastest, with the third very close, and the first at 1.5× the second. — Michaël, May 01 '21 at 01:16

waTeim · Answer 4 · 2014-01-21T00:49:39.397

Though you say bit set, aren't you really talking about arbitrary precision unsigned integer comparison. If so, then you're probably not going to easily do better then wrapping GMP.

From their website:

GMP is carefully designed to be as fast as possible, both for small operands and for huge operands. The speed is achieved by using fullwords as the basic arithmetic type, by using fast algorithms, with highly optimised assembly code for the most common inner loops for a lot of CPUs, and by a general emphasis on speed.

Consider their integer functions

Łukasz Kidziński · Answer 5 · 2014-01-20T22:44:52.470

3

How about checking the highest bit of XOR?

bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    return y[fls(x^y)]
}

int fls(const std::bitset<N>& n) {
    // find the last set bit
}

Some ideas for fps can be found here http://uwfsucks.blogspot.be/2007/07/fls-implementation.html.

edited Jan 20 '14 at 22:44

answered Jan 20 '14 at 22:16

Łukasz Kidziński

1,613
11
20

Problem: optimizing `fls` requires internal access to bitset just as much as the original question. – Ben Voigt Jan 21 '14 at 00:21

score 2 · Answer 6 · answered Sep 15 '18 at 18:49

Well, there's good old memcmp. It's fragile in the sense that it depends on the implementation of std::bitset. And therefore might be unusuable. But it's reasonable to assume the template creates an opaque array of ints. And has no other bookkeeping fields.

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{
    int cmp = std::memcmp(&x, &y, sizeof(x));
    return (cmp < 0);
}

This will uniquely determine an ordering for bitsets. But it might not be a human intuitive order. It depends on which bits are used for which set member index. For example, index 0 could be the LSB of the first 32 bit integer. Or it could be the LSB of the first 8 bit byte.

I strongly recommend unit tests to ensure this actually works for how it's used. ;->

score 0 · Answer 7 · answered Jan 17 '20 at 22:15

0

Only performing the bitwise comparison if the two bitsets are different already yields some performance boost:

template<std::size_t N>
bool operator<(const std::bitset<N>& x, const std::bitset<N>& y)
{       if (x == y)
                return false;
        ….
}

answered Jan 17 '20 at 22:15

It doesn't if they are always different. – Sopel Feb 06 '20 at 11:00

score 0 · Answer 8 · answered Feb 06 '20 at 10:07

I know it's a bit old question, but if you know the maximum size of a bitset you could create sth like this:

class Bitset{
    vector<bitset<64>> bits;
    /*
     * operators that you need
    */
};

This allows you to cast each of the bitsets<64> to unsigned long long for quick compare. If you want to get to the specific bit (in order to change it or whatever) you can do bits[id / 64][id % 64]

paperclip optimizer · Answer 9 · 2023-01-20T07:45:57.293

The underlying implementation of bitset uses uint64 in virtually all 64-bit CPUs, compilers, etc. since there is only one sensible way to write an implementation of the class with the given interface, which makes it easy to figure out a "portable" hack.

So assuming you just want the "obvious" efficient way to do it and your code won't be used to control nuclear arsenal, knowing full well this will void your warranty, yadda yadda yadda, here is the code you are looking for:

template <int N> bool operator<(const bitset<N> & a, const bitset<N> & b) {

    const uint64_t * p = (const uint64_t *)(&a);
    const uint64_t * q = (const uint64_t *)(&b);

    const uint64_t * r = p;

    int i= (sizeof(bitset<N>)-1)/sizeof(uint64_t);

    for (p+=i, q+=i; (p>=r) && (*p==*q); --p, --q) {}

    return *p<*q;
}

Basically cast to a uint64 array and compare element by element in reverse order until you find a discrepancy.

Also beware this assumes x86-64 endianness.

Fastest way to compare bitsets (< operator on bitsets)?

9 Answers9

EDIT