2

I would like to access/iterate over all non-unique keys in an unordered_multimap. The hash table basically is a map from a signature <SIG> that does indeed occur more than once in practice to identifiers <ID>. I would like to find those entries in the hash table where occurs once.

Currently I use this approach:

// map <SIG> -> <ID>
typedef unordered_multimap<int, int>    HashTable;
HashTable& ht = ...;
for(HashTable::iterator it = ht.begin(); it != ht.end(); ++it)
{
    size_t n=0;
    std::pair<HashTable::iterator, HashTable::iterator> itpair = ht.equal_range(it->first); 
    for (   ; itpair.first != itpair.second; ++itpair.first) {  
        ++n;
    }
    if( n > 1 ){ // access those items again as the previous iterators are not valid anymore
        std::pair<HashTable::iterator, HashTable::iterator> itpair = ht.equal_range(it->first); 
        for (   ; itpair.first != itpair.second; ++itpair.first) {  
           // do something with those items
        }
    }
}

This is certainly not efficient as the outer loop iterates over all elements of the hash table (via ht.begin()) and the inner loop tests if the corresponding key is present more than once.

Is there a more efficient or elegant way to do this?

Note: I know that with a unordered_map instead of unordered_multimap I wouldn't have this issue but due to application requirements I must be able to store multiple keys <SIG> pointing to different identifiers <ID>. Also, an unordered_map<SIG, vector<ID> > is not a good choice for me as it uses roughly 150% of memory as I have many unique keys and vector<ID> adds quite a bit of overhead for each item.

Stefan
  • 1,131
  • 2
  • 12
  • 30

2 Answers2

2

Use std::unordered_multimap::count() to determine the number of elements with a specific key. This saves you the first inner loop.

You cannot prevent iterating over the whole HashTable. For that, the HashTable would have to maintain a second index that maps cardinality to keys. This would introduce significant runtime and storage overhead and is only usefull in a small number of cases.

You can hide the outer loop using std::for_each(), but I don't think it's worth it.

Oswald
  • 31,254
  • 3
  • 43
  • 68
0

I think that you should change your data model to something like:

std::map<int, std::vector<int> > ht;

Then you could easily iterate over map, and check how many items each element contains with size()

But in this situation building a data structure and reading it in linear mode is a little bit more complicated.

jtomaszk
  • 9,223
  • 2
  • 28
  • 40
  • 1
    This is not better than using `unordered_map >`. A node-based container like `set` is not necessary as I know that my inserted IDs will be unique. – Stefan May 31 '13 at 07:49