6

I want to use a cache, implemented by boost's unordered_map, from a dynamic_bitset to a dynamic_bitset. The problem, of course, is that there is no default hash function from the bitset. It doesn't seem to be like a conceptual problem, but I don't know how to work out the technicalities. How should I do that?

Adam Stelmaszczyk
  • 19,665
  • 4
  • 70
  • 110
R S
  • 11,359
  • 10
  • 43
  • 50
  • See https://svn.boost.org/trac/boost/ticket/2841. – kennytm Oct 09 '10 at 16:19
  • I can't use this since m.bits is a private member (the suggestion is for a change in dynamic_bitset). – R S Oct 09 '10 at 16:31
  • m.bits should be public const, that's pretty stupid! Can you get away with using vector (which is a bitset, but one that works MUCH nicer) as the key? – Mahmoud Al-Qudsi Oct 09 '10 at 16:46
  • I'm using a bitset since I am doing a lot of set calculations (intersections etc.), which are much much faster when done bitwise. So I guess not. – R S Oct 09 '10 at 16:47

5 Answers5

6

I found an unexpected solution. It turns out boost has an option to #define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS. When this is defined, private members including m_bits become public (I think it's there to deal with old compilers or something).

So now I can use @KennyTM's answer, changed a bit:

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {            
        return boost::hash_value(bs.m_bits);
    }
}
R S
  • 11,359
  • 10
  • 43
  • 50
3

There's to_block_range function that copies out the words that the bitset consists of into some buffer. To avoid actual copying, you could define your own "output iterator" that just processes individual words and computes hash from them. Re. how to compute hash: see e.g. the FNV hash function.

Unfortunately, the design of dynamic_bitset is IMHO, braindead because it does not give you direct access to the underlying buffer (not even as const).

zvrba
  • 24,186
  • 3
  • 55
  • 65
  • Should it be really hard to just to copy-paste the dynamic_bitset header file and replace it with "my_dynamic_bitset", where all the difference is that it's not private anymore? – R S Oct 09 '10 at 16:51
  • It's a maintenance problem. You have to repeat the same procedure each time the mainstream file gets updated for any reason. – zvrba Oct 09 '10 at 18:27
3

It is a feature request.

One could implement a not-so-efficient unique hash by converting the bitset to a vector temporary:

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
        std::vector<B, A> v;
        boost::to_block_range(bs, std::back_inserter(v));
        return boost::hash_value(v);
    }
}
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • How do I use that? I tried "unordered_map > cache;" but it still doesn't compile. – R S Oct 09 '10 at 17:22
2

We can't directly calculate the hash because the underlying data in dynamic_bitset is private (m_bits)

But we can easily finesse past (subvert!) the c++ access specification system without either

  • hacking at the code or
  • pretending your compiler is non-conforming (BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS)

The key is the template function to_block_range which is a friend to dynamic_bitset. Specialisations of this function, therefore, also have access to its private data (i.e. m_bits).

The resulting code couldn't be simpler

namespace boost {


// specialise dynamic bitset for size_t& to return the hash of the underlying data
template <>
inline void
to_block_range(const dynamic_bitset<>& b, size_t& hash_result)
{
    hash_result = boost::hash_value(bs.m_bits);
}

std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) 
{            
    size_t hash_result;
    to_block_range(bs, hash_result);
    return hash_result;
}
}
Leo Goodstadt
  • 2,519
  • 1
  • 23
  • 23
  • Unfortunately, this doesn't appear to be true. The specific specialization of the to_block_range function is a friend of dynamic_bitset. It is not possible to have another function by the same name with different parameters while maintaining the access of a friend function. – BSchlinker Jan 26 '13 at 11:42
  • @BSchlinker I disagree: `boost::dynamic_bitset` is declared as: `template > class dynamic_bitset;` – Leo Goodstadt Jan 29 '14 at 18:56
  • @BSchlinker: The original _befriended_ template function is: `template friend void to_block_range(const dynamic_bitset& b, BlockOutputIterator result);` Thus the specialisation in `template <> inline void to_block_range(const dynamic_bitset<>&, tuple)` means `typename B = unsigned long`, `typename A = std::allocator`, `typename BlockOutputIterator = tuple`. Looks like cheating and very naughty... but legitimate c++. – Leo Goodstadt Jan 29 '14 at 19:01
  • @BSchlinker. Also it compiles! (GCC 4.8 and clang 3.4) – Leo Goodstadt Jan 29 '14 at 19:04
0

the proposed solution generates the same hash in the following situation.

#define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {            
        return boost::hash_value(bs.m_bits);
    }
}

boost::dynamic_biset<> test(1,false);

auto hash1 = boost::hash_value(test);

test.push_back(false);

auto hash2 = boost::hash_value(test);

// keep continue...

test.push_back(false);

auto hash31 = boost::hash_value(test);

// magically all hash1 to hash31 are the same!

the proposed solution is sometimes improper for hash map.

I read the source code of dynamic_bitset why this happened and realized that dynamic_bitset stores one bit per value as same as vector<bool>. For example, you call dynamic_bitset<> test(1, false), then dynamic_bitset initially allocates 4 bytes with all zero and it holds the size of bits (in this case, size is 1). Note that if the size of bits becomes greater than 32, then it allocates 4 bytes again and push it back into dynamic_bitsets<>::m_bits (so m_bits is a vector of 4 byte-blocks).

If I call test.push_back(x), it sets the second bit to x and increases the size of bits to 2. If x is false, then m_bits[0] does not change at all! In order to correctly compute hash, we need to take m_num_bits in hash computation.

Then, the question is how?

1: Use boost::hash_combine This approach is simple and straight forward. I did not check this compile or not.

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) { 
        size_t tmp = 0;
        boost::hash_combine(tmp,bs.m_num_bits);           
        boost::hash_combine(tmp,bs.m_bits);
        return tmp;
    }
}

2: flip m_num_bits % bits_per_block th bit. flip a bit based on bit size. I believe this approach is faster than 1.

namespace boost {
    template <typename B, typename A>
    std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
        // you may need more sophisticated bit shift approach.
        auto bit = 1u << (bs.m_num_bits % bs.bits_per_block);
        auto return_val = boost::hash_value(bs.m_bits);

       // sorry this was wrong
       //return (return_val & bit) ? return_val | bit : return_val & (~bit);
       return (return_val & bit) ? return_val & (~bit) : return_val | bit;
    }
}
mossan
  • 1
  • 2