Abstraction for bitset element iteration

Question

I have a custom bitset class implementation in C++. I often iterate over the indexes of bits that are set in the bitset (i.e. for bitset '10011' I want to iterate over numbers 0, 3, 4.) This iteration can be implemented as follows:

struct Bitset {
  uint64_t* data_;
  size_t chunks_;
  std::vector<int> Elements() const {
    std::vector<int> ret;
    for (size_t i=0;i<chunks_;i++){
      uint64_t td = data_[i];
      while (td) {
        ret.push_back(i*BITS + __builtin_ctzll(td));
        td &= ~-td;
      }
    }
    return ret;
  }
};

void Iterate(Bitset bitset) {
  for (int b : bitset.Elements()) {
    std::cout << "bit: " << b << std::endl;
  }
}

The above implementation provides clean code for the iteration, but it involves an unnecessary heap allocation with the vector. The following version which essentially inlines the Elements() function is often faster:

void Iterate(Bitset bitset) {
  int chunks = bitset.chunks_;
  for (int i = 0; i < chunks; i++) {
    uint64_t td = bitset.data_[i];
    while (td) {
      std::cout << "bit: " << i*BITS + __builtin_ctzll(td) << std::endl;
      td &= ~-td;
    }
  }
}

What would be a good way to implement an abstraction for the iteration so that it would be as clean as the above version, but also with no performance cost.

You may wish to check out how [`std::vector`](https://en.cppreference.com/w/cpp/container/vector_bool) attempts to solve this exact problem. — danielschemmel, Jan 18 '20 at 22:20
Returning a vector of ints is a lot of memory, why not write an iterator? Och this is your question. A good way would be to write a `begin` and `end` methods and write an iterator. `i*BITS` - what is "BITS"? Why is that `i*BITS + __builtin_ctzll(td)` doing at all? Is it doing better code then plain `value && ( << position)` ? I can't understand what `i*BITS` could do - `i` is only increasing and is unrelated to `data_[i]` value, so I assume it's a bug. — KamilCuk, Jan 18 '20 at 22:37
A C++20 coroutine? A Bitset::for_each that takes a function as argument? — Marc Glisse, Jan 18 '20 at 23:00

score 0 · Answer 1 · answered Jan 18 '20 at 22:50

Just iterate over your class. Provide your own implementation of an iterator class for your Bitset and provide begin() and end() methods. A simplest (untested!) implementation could look something like this:

#include <vector>
#include <cstdint>
#include <iostream>

struct Bitset {
  uint64_t* data_;
  size_t chunks_;
  struct iterator {
      uint64_t *pnt;
      uint_fast8_t pos;
      iterator(uint64_t *pnt, size_t pos) : 
        pnt(pnt), pos(pos) {}
      bool operator !=(const iterator& o) {
          return o.pnt != pnt || o.pos != pos;
      }
      void operator ++() {
          pos++;
          if (pos == 64) {
              pnt++;
              pos = 0;
          }
      }
      bool operator *() {
          return *pnt & (1 << pos);
      }
  };
  iterator begin() { return iterator(data_, 0); }
  iterator end() { return iterator(data_ + chunks_, 64); }
};

void Iterate(Bitset bitset) {
    for (auto&& b : bitset) {
        std::cout << "bit: " << b << std::endl;
    }
}

I believe for your strange while (td) { ... i*BITS + __builtin_ctzll(td) ... loop that I don't understand that could be something along (untested!):

constexpr int BITS = 100000;
struct Bitset {
  uint64_t* data_;
  size_t chunks_;
  struct iterator {
      uint64_t *data_;
      int i = 0;
      uint64_t td = 0;
      iterator(uint64_t *data_, int i, uint64_t td) :
       data_(data_), i(i), td(td) {}
      bool operator !=(const iterator& o) {
          return o.data_ != data_ || o.i != i || o.td != td;
      }
      void operator ++() {
          if (td == 0) {
              td = *data_;
              data_++;
          } else {
            td &= ~-td;
          }
      }
      bool operator *() {
          return i * BITS + __builtin_ctzll(td);
      }
  };
  iterator begin() { return iterator(data_, 0, *data_); }
  iterator end() { return iterator(data_ + chunks_, 0, 0); }
};

My intention is to iterate over the indexes of bits that are 1. For example for "10011" I would print 0, 3, 4. I think you are right that I should probably look at iterators, but your implementations don't really solve my problem. — Laakeri, Jan 18 '20 at 23:00
Then it would be better if you would state your intentions in your question. Och I see ya. — KamilCuk, Jan 18 '20 at 23:01

score 0 · Answer 2 · answered Jan 21 '20 at 11:01

As KamilCuk suggested, I used an iterator to solve this problem. Now the implementation looks like:

struct Bitset {
  uint64_t* data_;
  size_t chunks_;
  class BitsetIterator {
   private:
    const Bitset* const bitset_;
    size_t pos_;
    uint64_t tb_;
   public:
    BitsetIterator(const Bitset* const bitset, size_t pos, uint64_t tb) :
      bitset_(bitset), pos_(pos), tb_(tb) { }
    bool operator!=(const BitsetIterator& other) const {
      return pos_ != other.pos_ || tb_ != other.tb_;
    }
    const BitsetIterator& operator++() {
      tb_ &= ~-tb_;
      while (tb_ == 0 && pos_ < bitset_->chunks_) {
        pos_++;
        if (pos_ < bitset_->chunks_) {
          tb_ = bitset_->data_[pos_];
        }
      }
      return *this;
    }
    int operator*() const {
      return pos_*BITS + __builtin_ctzll(tb_);
    }
  };

  BitsetIterator begin() const {
    size_t pos = 0;
    while (pos < chunks_ && data_[pos] == 0) {
      pos++;
    }
    if (pos < chunks_) {
      return BitsetIterator(this, pos, data_[pos]);
    } else {
      return BitsetIterator(this, pos, 0);
    }
  }
  BitsetIterator end() const {
    return BitsetIterator(this, chunks_, 0);
  }
};

void Iterate(Bitset bitset) {
  for (int b : bitset) {
    std::cout << "bit: " << b << std::endl;
  }
}

This avoids heap allocation and is much faster than the version that uses vector. I'm not sure if this provides exactly same performance as the version without any abstractions, but it should be very close.

Abstraction for bitset element iteration

2 Answers2