2

I require a searchable collection of GUIDs (stored as 16 bytes) where the actual unique ID is a member of a smart pointer struct/class. This is reference counted and pointed to by other objects on a 'last reference deletes' basis - akin to std::shared_ptr. But because of the custom nature of my smart pointer class I don't want to use shared_ptr.

However, I do want to use something like std::unordered_map or std::unordered_set (if they are fast enough) to hold the collection of pointers.

Even though the smart pointers address is unique and therefore it is good to use as the hash, I need the searchable Key in the table to be the GUID; so that I can use find(guid) to quickly locate the correct smart pointer.

Hard to explain in words, so here is some code:

class SmartPointer
{
public:
   GUID guid;
   int refCount; // Incremented/decremented when things point or stop pointing to it.
   void* actualObject; // The actual data attached to the smart pointer.
};

// Generate a unique 128-bit id.
GUID id;

// Create the smart pointer object.
SmartPointer* sp = new SmartPointer();
sp->guid = id;

// Create a set of SmartPointers.
std::unordered_set<SmartPointer*> set;
set.insert(sp);

// Need to find a SmartPointer based on a known GUID and not the pointer itself.
SmartPointer* found = set.find(id);

I think this should be possible with some dereferencing in custom hash/equality functions like here but I am not sure exactly how.

Community
  • 1
  • 1
CWats
  • 21
  • 1
  • 2
  • 1
    Is there a reason you're writing your own smart pointer class instead of using an existing one? I can see some problems down the road with the implementation you've created. – Drew Dormann Mar 20 '13 at 14:21
  • 1
    You just need to implement your own hash functor that takes a `SmartPointer*` and does the right thing. Then instantiate the map with it. See [here](http://en.cppreference.com/w/cpp/container/unordered_map/unordered_map). – juanchopanza Mar 20 '13 at 14:24

1 Answers1

6

With standard hash containers, you need a hashable key (that is, something that can be converted into a size_t by a hashing algo) and an equivalence operator for the keys, in case the hashes collide (i.e., two GUIDs are different but hash to the same value).

In order to look-up an SmartPointer by a GUID, you probably want an unordered_map rather than an unordered_set. See the example at the bottom.

In any case, there are two ways to hash a custom class: (1) specialize std::hash or (2) provide a hashing and possibly equality functors in your hash container's type definition.

Specialize std::hash

To specialize std::hash for your smart pointer so that it looks at the GUID, do something like this:

namespace std 
{ 
    template<>
    struct hash< SmartPointer > 
    {
        size_t operator()( const SmartPointer& sp ) const
        {
            return hash( sp.guid );
        }
    };

    template<>
    struct hash< GUID > 
    {
        size_t operator()( const GUID& id ) const
        {
            return /* Some hash algo to convert your GUID into a size_t */;
        }
    };
}

(The two specializations could be combined depending on your needs. If you only use GUIDs as the hashing key, as in the unordered_map example below, then you don't need the specialization for SmartPointer. If you only hash SmartPointer as your key, as you would when using only std::unordered_set, then you could hash sp.guid directly in the first specialization rather than having it kick the can on to the second specialization.)

With these specializations defined, it will automagically hash for you in standard hash-based containers, assuming you have equality comparisons defined for your hash type. Use it like: std::unordered_map<GUID, SharedPointer> or std::unordered_set<SharedPointer>. (For more on specializing in this way, cf. How to extend std::tr1::hash for custom types?.)

Use custom functors in the hash container type

For (2), you could change the type of your unordered set/map and supply functor(s) as template param(s):

struct HashSmartPointer
{
    std::size_t operator()( const SmartPointer& sp ) const
    {
        return /* Some hash algo to convert your sp.guid into a size_t */;
    }
};

std::unordered_set< SmartPointer, HashSmartPointer > mySet;

Again assuming you have equality defined for SmartPointer for handling collisions (otherwise, add another param onto the unordered_set's template args for the equality functor).

Complete example

Here's a complete program that demonstrates what I think you're asking for:

#include <vector>
#include <cstdlib>
#include <cstdint>
#include <algorithm>
#include <cassert>
#include <unordered_map>

class GUID // Some goofy class. Yours is probably better
{
public:
   std::vector<uint8_t> _id;

   GUID()
     : _id(16)
   {
      std::generate(_id.begin(),_id.end(), std::rand);
   }

   friend bool operator ==( const GUID& g1, const GUID& g2 )
   {
      return std::equal( g1._id.begin(), g1._id.end(), g2._id.begin() );
   }

   friend bool operator !=( const GUID& g1, const GUID& g2 )
   {
      return !(g1 == g2);
   }
};

class SmartPointer
{
public:
   GUID guid;
   int refCount; // Incremented/decremented when things point or stop pointing to it.
   void* actualObject; // The actual data attached to the smart pointer.

   friend bool operator ==( const SmartPointer& p1, const SmartPointer& p2 )
   {
      // This may not be right for you, but good enough here
      return p1.guid == p2.guid;
   }
};

struct HashGUID
{
    std::size_t operator()( const GUID& guid ) const
    {
        // Do something better than this. As a starting point, see:
        //   http://en.wikipedia.org/wiki/Hash_function#Hash_function_algorithms
        return std::accumulate( guid._id.begin(), guid._id.end(), std::size_t(0) );
    }
};

int main()
{
   // Create the smart pointer object.
   SmartPointer sp1, sp2, sp3;

   assert( sp1.guid != sp2.guid );
   assert( sp1.guid != sp3.guid );
   assert( sp2.guid != sp3.guid );

   // Create a set of SmartPointers.
   std::unordered_map<GUID, SmartPointer, HashGUID> m;
   m[sp1.guid] = sp1;
   m[sp2.guid] = sp2;
   m[sp3.guid] = sp3;

   const GUID guid1 = sp1.guid;    
   const GUID guid2 = sp2.guid;    
   const GUID guid3 = sp3.guid;    

   // Need to find a SmartPointer based on a known GUID and not the pointer itself.
   auto found1 = m.find( guid1 );
   auto found2 = m.find( guid2 );   
   auto found3 = m.find( guid3 );   

   assert( found1 != m.end() );
   assert( found2 != m.end() );
   assert( found3 != m.end() );

   assert( found1->second == sp1 );
   assert( found2->second == sp2 );
   assert( found3->second == sp3 );
}

Update for OP comment below

As a rule of thumb, if you're storing raw pointers in standard containers, you're probably doing it wrong. Doubly so if you're storing a raw pointer of a smart pointer. The point of reference-counted pointers is that the contained pointee (actualObject) is not duplicated while there can be many copies of the smart pointer apparatus floating around, each corresponding to one increment of the reference count and each referring to the same contained object. Hence, you'd typically see something like std::unordered_set< std::shared_ptr<MyClass>, Hasher, Equality >.

If you want to have one GUID for all the instances of your SmartPointer, you may want to have the GUID be a (secret) part of the ref-counted data:

class SmartPointer
{
public:
   int refCount; // Incremented/decremented when things point or stop pointing to it.
   struct
   {
       GUID guid;
       void* actualObject; // The actual data attached to the smart pointer.
   } *refCountedData;
};

Using the SmartPointer with std::unordered_set, you'd have only one copy of the GUID, but since all the hashing machinery is internal to the std::unordered_set, you don't have access to the hashing key. To look it up in the set by GUID, you'd need to do a manual search, which negates the advantage of hashing.

To get what you want, I think you either need to define your own hash container that gives you more control over hashing from the outside, or use something like an intrusively reference-counted GUID object, e.g.:

class GUID
{
public:
    typedef std::vector<std::uint8_t> ID;
    int refCount;
    ID* actualID;
    // ...
};

SmartPointer sp1, sp2;

std::unordered_map< GUID, SmartPointer > m;
m[ sp1.guid ] = sp1;
m[ sp2.guid ] = sp2;

In this case, only one copy of the GUID actualID exists, even though it's the key to the map and a member of the value in the map, but there will be multiple copies of its ref-counting apparatus.

On a 64-bit system, the count may be 32-bit and the pointer 64-bit, which means 12 bytes total for each copy of a GUID object, and a savings of 4-bytes per actual GUID. With a 32-bit counter and pointer, it would save 8-bytes per GUID, and with a 64-bit of counter and pointer, it would take the same space as the GUID data. One of the first two may or may not be worth it in your application/platform, but the last is not likely worth it.

If it were me, I'd just make a copy of the GUID object as the key until I knew it was unacceptable based on measurement. Then I could optimize the implementation of the GUID object to be internally ref-counted without affecting the user's code.

Community
  • 1
  • 1
metal
  • 6,202
  • 1
  • 34
  • 49
  • Bear in mind that OP is storing `SmartPointer*` in his/her map. Also, the `unary_function` stuff is deprecated in C++11. – juanchopanza Mar 20 '13 at 15:31
  • Noted and removed. I did change the OP's code a bit, but the idea still stands. – metal Mar 20 '13 at 15:54
  • Thanks metal.. Yes, that idea looks close to what i was thinking. The problem though is it would mean storing two versions of the GUID, one in the smart pointer and one in the map. This would be restrictive as I may want to store millions of smart pointers, and only wanted one copy of the GUID which must be in the smartpointer class itself. Hence the requirement for a pointer to the smartpointer from the map. The map should reference the smartpointer in the same way as any other object ( with reference counting) – CWats Mar 20 '13 at 17:38
  • Didn't think of the potential sizes of the various pointers relative to the size of the GUID. I will probably use a copy of the ID as the key in the map as you say and reconsider only if it causes a problem later. Thanks. – CWats Mar 20 '13 at 22:26