Problem domain
I have a (potentially) long lists of pairings of data that I need to merge (and perform some logic on) such that there are no duplicates. The pairings were of an int
type, but due to growth of amount of data, I'm converting it to be pairings of size_t
, and thus my data type is declared now as pair<size_t, size_t>
.
The code previously checked for uniqueness by having a hash_set
and probing into it to see if a given pairing has already been seen and processed. As a key, it conveniently used INT64
and built a key using bit shifting and packing:
INT64 key = ((INT64)pairsListEntry->first) << 32 | pairsListEntry->second;
This was working well since two int
's perfectly fit into INT64
and resulting in unique key. But for obvious reasons this doesn't work anymore.
Immediate problem
To adjust for new sizing, I tried to refactor and declare my hash_set
as:
std::hash_set<pair<size_t, size_t>> m_seenPairs;
This, however, fails when compiling code that creates an instance of that class with the following error message:
C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\include\xhash(71) : error C2440: 'type cast' : cannot convert from 'const std::pair<_Ty1,_Ty2>' to 'size_t'
This occurs deep inside STL's implementation, in the following function (on the return
line):
template<class _Kty> inline
size_t hash_value(const _Kty& _Keyval)
{ // hash _Keyval to size_t value one-to-one
return ((size_t)_Keyval ^ _HASH_SEED);
}
The cause is pretty clear: pair<T1, T2>
doesn't know how to cast to size_t
in order to calculate the hash code.
At this point, I'm stuck on how to do get it to work. Google-foo isn't foo'ing up much. I saw couple of posts on SO with std::map
and pair
, but there it seems "to just work".
The environment is VS2008, x64 platform unmanaged target.
Things I tried
I tried to provide my own comparer as I've seen a post that looked at least barely remotely similar as the following:
struct pairs_equal_compare
{
bool operator()(const pair<SampleIdIndex_t, SampleIdIndex_t> & p1, const pair<SampleIdIndex_t, SampleIdIndex_t> & p2) const
{
return (p1.first == p2.first) && (p1.second == p2.second);
}
};
// Holds a set of pairs that are known to exist for deduplication purposes.
stdext::hash_set<pair<SampleIdIndex_t, SampleIdIndex_t>,
stdext::hash_compare<pair<SampleIdIndex_t, SampleIdIndex_t>, pairs_equal_compare>> m_seenPairs;
This (by the time I got the declarations and struct properly declared) resulted in exactly the same error - realizing now that it doesn't really help bypass internal call to hash_value
to calculate hash code.
I also briefly tried using pairs_equal_compare
in place of hash_compare
, but this produced more compilation errors and looks like a wrong direction to go...
It seems there has to be a reasonable way to get hash_set
to work on pair
(or any non-integer type data), but it eludes me on how to accomplish that.