1

Problem domain

I have a (potentially) long lists of pairings of data that I need to merge (and perform some logic on) such that there are no duplicates. The pairings were of an int type, but due to growth of amount of data, I'm converting it to be pairings of size_t, and thus my data type is declared now as pair<size_t, size_t>.

The code previously checked for uniqueness by having a hash_set and probing into it to see if a given pairing has already been seen and processed. As a key, it conveniently used INT64 and built a key using bit shifting and packing:

INT64 key = ((INT64)pairsListEntry->first) << 32 | pairsListEntry->second;

This was working well since two int's perfectly fit into INT64 and resulting in unique key. But for obvious reasons this doesn't work anymore.

Immediate problem

To adjust for new sizing, I tried to refactor and declare my hash_set as:

std::hash_set<pair<size_t, size_t>> m_seenPairs;

This, however, fails when compiling code that creates an instance of that class with the following error message:

C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\xhash(71) : error C2440: 'type cast' : cannot convert from 'const std::pair<_Ty1,_Ty2>' to 'size_t'

This occurs deep inside STL's implementation, in the following function (on the return line):

template<class _Kty> inline
size_t hash_value(const _Kty& _Keyval)
{   // hash _Keyval to size_t value one-to-one
return ((size_t)_Keyval ^ _HASH_SEED);
}

The cause is pretty clear: pair<T1, T2> doesn't know how to cast to size_t in order to calculate the hash code.

At this point, I'm stuck on how to do get it to work. Google-foo isn't foo'ing up much. I saw couple of posts on SO with std::map and pair, but there it seems "to just work".

The environment is VS2008, x64 platform unmanaged target.

Things I tried

I tried to provide my own comparer as I've seen a post that looked at least barely remotely similar as the following:

struct pairs_equal_compare
{
    bool operator()(const pair<SampleIdIndex_t, SampleIdIndex_t> & p1, const pair<SampleIdIndex_t, SampleIdIndex_t> & p2) const
    {
        return (p1.first == p2.first) && (p1.second == p2.second);
    }
};

// Holds a set of pairs that are known to exist for deduplication purposes.
stdext::hash_set<pair<SampleIdIndex_t, SampleIdIndex_t>, 
    stdext::hash_compare<pair<SampleIdIndex_t, SampleIdIndex_t>, pairs_equal_compare>> m_seenPairs;

This (by the time I got the declarations and struct properly declared) resulted in exactly the same error - realizing now that it doesn't really help bypass internal call to hash_value to calculate hash code.

I also briefly tried using pairs_equal_compare in place of hash_compare, but this produced more compilation errors and looks like a wrong direction to go...

It seems there has to be a reasonable way to get hash_set to work on pair (or any non-integer type data), but it eludes me on how to accomplish that.

Community
  • 1
  • 1
LB2
  • 4,802
  • 19
  • 35
  • The code in the post you linked looks nothing like yours. He has `hash_set` and you have `hash_set>` – Mooing Duck May 23 '14 at 18:59
  • @ildjarn From what I've gathered (it's been long since I did `VC++` development) STL went through major changes. But it seems a bit of a leap to say that in VS2008 `hash_set` is limited to integer types, or is it? – LB2 May 23 '14 at 18:59
  • @MooingDuck I did notice that - but the funny thing is that in that post `hash_set` declaration is not given, but implies template list to take underlying type followed by two simple functors. Under VS2008, template list is template >, class _Alloc = _STD allocator<_Kty> > with last one being allocator rather than functor. So definitely a mismatch here (that I don't know how to even explain - that's beyond my knowledge of STL). – LB2 May 23 '14 at 19:03
  • @ildjarn VS2008 has it under `stdext` namespace rather than `std`. It definitely recognizes `hash_set` type and it did build and run when it was `hash_set`. So it is definitely **not** due to unknown or undefined type. – LB2 May 23 '14 at 19:05
  • @LB2: That's a very good point, the linked code doesn't remotely work – Mooing Duck May 23 '14 at 19:12
  • @LB2 If the VS neanderthal uses the same deprecated implementation as gnu cxx does, [you may find this helpful](http://pastebin.com/GxCRm1Je). – WhozCraig May 23 '14 at 19:12
  • My mistake, I was thinking of `unordered_set` all along. You're right, `hash_set` only works with types implicitly convertible to `size_t` by default; you'll need to supply a custom `Traits` argument, with a custom hash object for `std::pair<>` for its first argument (the second, default argument is fine). – ildjarn May 23 '14 at 19:14
  • @ildjarn I believe I have it indirectly (there are a bunch of includes) because `pair` which is declared there resolves without issues, and `F12` takes me straight to it in `utility`. – LB2 May 23 '14 at 19:15
  • @ildjarn I did try to use my `pairs_equal_compare` inplace of `hash_compare`, but that turned out to be quite wrong as it's not just a functor but a rather complex 'service' that I don't know quite how to write. Is this something you can provide a gist of in an example? – LB2 May 23 '14 at 19:23

2 Answers2

1

Out of the box, stdext::hash_set<> only works with types that are implicitly convertible to size_t. For std::pair<>, you'll need to supply an argument to stdext::hash_compare<> (for stdext::hash_set<>'s Traits parameter) that behaves as such, since std::pair<> itself does not.

The following works for me with VS2013, and I don't see why it wouldn't also work with VS2008:

#include <cstddef>
#include <utility>
#include <hash_set>

struct pair_hasher
{
    typedef std::pair<std::size_t, std::size_t> value_type;

    value_type value;

    pair_hasher(value_type const& v) : value(v) { }

    operator std::size_t() const
    {
        return (5381 * 33 ^ value.first) * 33 ^ value.second;
    }
};

bool operator <(pair_hasher const& a, pair_hasher const& b)
{
    return a.value < b.value;
}

Then you'll need to declare your stdext::hash_set<> instance as such:

stdext::hash_set<
    std::pair<std::size_t, std::size_t>,
    stdext::hash_compare<pair_hasher>
> s;

For types other than integral types for the std::pair<>, update pair_hasher::operator std::size_t as necessary (operator < should be fine as-is, as long as the types within the std::pair<> are themselves already comparable).

ildjarn
  • 62,044
  • 9
  • 127
  • 211
1

You can also use a suitable Traits object that behaves like hash_compare, i.e., it must define two operator()s:

size_t operator()(const Key &key) const; // This one returns the hash of key
bool operator()(const Key &first, 
                const Key &second) const; // This one returns true if first is less than second

and two integer constants, which you can probably just take from the default implementation:

const size_t bucket_size = 4;
const size_t min_buckets = 8;

See documentation of hash_compare.

The code would look like

struct pair_comparator{
    typedef std::pair<std::size_t, std::size_t> Key;
    size_t operator()(const Key &key) const { return /* your hash code here */; }
    bool operator()(const Key &first, 
                const Key &second) const { return first < second; }
    const size_t bucket_size = 4;
    const size_t min_buckets = 8;
};

stdext::hash_set<
    std::pair<std::size_t, std::size_t>,
    pair_comparator
> s;

Edit: The documentation says that you can also derive from a specialization of hash_compare, and only override the members you don't like, so:

struct pair_comparator : public stdext::hash_compare<std::pair<std::size_t, std::size_t> >{
    typedef std::pair<std::size_t, std::size_t> Key;
    size_t operator()(const Key &key) const { return /* your hash code here */; }
    bool operator()(const Key &first, 
                const Key &second) const { return first < second; }
};

Which should avoid the problem of having to define the const int members.

T.C.
  • 133,968
  • 17
  • 288
  • 421
  • 1
    Except for having to manually provide `bucket_size` and `min_buckets`, I do like this solution better – much less boilerplate. – ildjarn May 23 '14 at 19:54
  • @T.C. I'm trying to implement this (one in edit), and now I'm getting a different error `2>C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\xhash(661) : error C2664: 'bool stdext::hash_compare<_Kty>::operator ()(const _Kty &,const _Kty &) const' : cannot convert parameter 1 from 'const std::pair<_Ty1,_Ty2>' to 'const CMainApplication::pair_comparator &'` in `'s _Paircc _Hash::equal_range(const key_type& _Keyval) const` function, line `if (!this->comp(this->_Kfn(*_Where), _Keyval))`... ` – LB2 May 23 '14 at 20:45
  • @LB2 Did you pass `hash_compare` instead of just `pair_comparator`? – T.C. May 23 '14 at 20:53
  • @T.C. Yes: `typedef stdext::hash_set, stdext::hash_compare> SeenPairsHashSet; SeenPairsHashSet *m_dedupSeenPairs;`. And `SampleIdIndex_t` is just a `typedef size_t`. – LB2 May 23 '14 at 21:01
  • @LB2 You should do `typedef stdext::hash_set, pair_comparator> SeenPairsHashSet;`. – T.C. May 23 '14 at 21:12
  • @T.C. That made it compile now, thank you - running to see if it works now. – LB2 May 23 '14 at 21:18