I have a custom bitset class implementation in C++. I often iterate over the indexes of bits that are set in the bitset (i.e. for bitset '10011' I want to iterate over numbers 0, 3, 4.) This iteration can be implemented as follows:
struct Bitset {
uint64_t* data_;
size_t chunks_;
std::vector<int> Elements() const {
std::vector<int> ret;
for (size_t i=0;i<chunks_;i++){
uint64_t td = data_[i];
while (td) {
ret.push_back(i*BITS + __builtin_ctzll(td));
td &= ~-td;
}
}
return ret;
}
};
void Iterate(Bitset bitset) {
for (int b : bitset.Elements()) {
std::cout << "bit: " << b << std::endl;
}
}
The above implementation provides clean code for the iteration, but it involves an unnecessary heap allocation with the vector. The following version which essentially inlines the Elements() function is often faster:
void Iterate(Bitset bitset) {
int chunks = bitset.chunks_;
for (int i = 0; i < chunks; i++) {
uint64_t td = bitset.data_[i];
while (td) {
std::cout << "bit: " << i*BITS + __builtin_ctzll(td) << std::endl;
td &= ~-td;
}
}
}
What would be a good way to implement an abstraction for the iteration so that it would be as clean as the above version, but also with no performance cost.