Smart or raw pointers when implementing data structures?

Question

I have been reading and experimenting with the standard library's smart pointers, unique_ptr and shared_ptr and although they are obviously great replacements for a lot of cases where raw pointers might be deemed dangerous I am unsure of their use when implementing data structures.

To experiment, I have written an example of a hash map which uses shared_ptr - which according to the Meyer's Effective Modern C++ are roughly double the size of a unique_ptr. For this reason I would like to use unique_ptr but I am kind of stumped due to what I am performing in the Add function, updating and copying.

Does anyone have any suggestions for this problem? Should data structures remain to be written using raw pointers?

#pragma once
#include "core.h"

const int TABLE_SIZE = 256;

template<typename K>
class HashKey {
public:
    unsigned long operator()(const K& p_key) const {
        return (p_key) % TABLE_SIZE;
    }
};

template<typename K, typename T>
class HashNode {
public:
    K m_key;
    T m_value;
    std::shared_ptr<HashNode> next = nullptr;
};


template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array< std::shared_ptr< HashNode<K, T> >, 128 > m_table;
    F m_hash_function;
    int m_elem_count{ 0 };

    void Add(K p_key, T p_value);
};

template<typename K, typename T, typename F = HashKey<K>>
void HashMap<K, T, F>::Add(K p_key, T p_value)
{
    unsigned long key = m_hash_function(p_key);

    std::shared_ptr<HashNode<K, T>> new_node = std::make_shared<HashNode<K, T>>();
    new_node->m_key = p_key;
    new_node->m_value = p_value;


    if (m_table[key] == nullptr) {
        /* only item in the bucket */
        m_table[key] = std::move(new_node);
        m_elem_count++;
    }
    else {
        /* check if item exists so it is replaced */
        std::shared_ptr< HashNode<K, T> > current = m_table[key];
        std::shared_ptr< HashNode<K, T> > previous = m_table[key];
        while (current != nullptr && p_key != current->m_key ) {
            previous = current;
            current = current->next;
        }
        if (current == nullptr) {
            previous->next = new_node;
            //current = new_node;
            m_elem_count++;
        }
        else {
            current->m_value = p_value;
        }

    }
}

void TestHashMap() {

    HashMap<int, std::string> hash_map;

    hash_map.Add(1, "one");
    hash_map.Add(2, "does");
    hash_map.Add(3, "not");
    hash_map.Add(50, "simply");
    hash_map.Add(11, "waltz");
    hash_map.Add(11, "into");
    hash_map.Add(191, "mordor");

    std::cout << hash_map.m_elem_count << std::endl;
}

Generally speaking, you should think about the new smart pointers in terms of *ownership* instead of simply auto-deleting pointers. Can a "resource" have multiple simultaneous owners (`std::shared_ptr`) or only a single onwer at a time (`std::unique_ptr`). — Some programmer dude, Jan 07 '16 at 16:46
I’m not sure why you feel the need for pointers here at all — or `std::array`, for that matter. Why wouldn’t you implement the hash map in terms of `std::vector>`? Generally, the question puzzles me a bit: most of what you should be doing is writing data structures (and algorithms for them). Consequently, you would *of course* use smart pointers, not dumb pointers, for data structures — what else would you use smart pointers for, then? There’s nothing else. — Konrad Rudolph, Jan 07 '16 at 17:04
When writing code such as a rotate in a tree data structure, I tend to micro optimize, which typically includes using pointers a few instructions after they have given up conceptual ownership. As long as I am using raw pointers and I understand the ownership semantics of the code, I can bend the ownership rules exactly where it is safe and efficient to do so. With a smart pointer, the optimizer could never get the generated code back to as efficient as what I would have written with raw pointers. A less skilled programmer would find less opportunities for benefit and risk more mistakes. — JSF, Jan 07 '16 at 17:04
@JSF I feel that that’s a fallacy. Sure, there are certain situations where you can optimise (for instance, `std::unique_ptr` cannot currently be passed into a function inside a register, a raw pointer can). But in general it’s hard to make such assertions since optimisers in compilers get ever better and many assumptions you’ll make in manual implementations change from platform to platform. In particular, I’m not even sure what you mean by “using … after [giving up] conceptual ownership”. That sounds like UB. — Konrad Rudolph, Jan 07 '16 at 17:08
Unique "conceptual" ownership means one pointer at a time "owns" the object, and the programmer knows which one and the compiler doesn't. After handing off conceptual ownership, you still have a pointer to a valid object until/unless the new owner deletes it. So between the moment ownership is handed off and some later time at which the new owner could have deleted the object, a raw pointer has valid access to the object, where a `std::unique_ptr` would not (all assuming no need for exception safety in that part of the code). — JSF, Jan 07 '16 at 17:12
@JSF Can you give a specific example of that? It seems like a *very* specific situation. In general, I’d just use normal `unique_ptr`s to manage the ownership (transfer) and raw pointers to maintain a transitive, non-owning reference to the same object. — Konrad Rudolph, Jan 07 '16 at 17:14

Vittorio Romeo · Accepted Answer · 2016-01-07T17:03:02.373

The choice of smart pointer depends on how your data structure "owns" the heap-allocated objects.

If you need to simply observe, and not own an object (independently of whether it is heap-allocated or not), a raw pointer, a reference or an std::reference_wrapper is the appropriate choice.

If you need unique ownership (at most one owner of the heap-allocated object), then use std::unique_ptr. It has no additional time/memory overhead.

If you need shared ownership (any number of owners of the heap-allocated object), then use std::shared_ptr. It results in additional time/memory overhead because an additional pointer (to the reference count metadata) has to be stored, and because accessing it is guaranteed to be thread-safe.

There's no need to use std::unique_ptr (in place of a raw pointer) unless you actually need to own the object.

Assuming you need to own the object, there's no need to use std::shared_ptr (in place of an std::unique_ptr) unless you actually need shared ownership semantics.

In your case, it looks like you have a maximum of heap nodes in your HashMap. Therefore, I'm assuming that you want the HashMap instance to be the unique owner of the nodes.

What type should you use?

template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array</* ? */, 128 > m_table;
    // ...
};

You have two options:

If you want to store the objects with an heap indirection, use std::unique_ptr, as the unique owner of these heap-allocated object is and will always be the HashMap instance.
If you want to store the objects directly into the HashMap, with no heap indirection, then do not use any pointer at all. This could lead to very big HashMap instances. Interface for accessing the next nodes becomes cumbersome.

Option 1 (store nodes in the heap):

This is the most common, and probably best option.

template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array<std::unique_ptr<HashNode<K, T>>, 128 > m_table;
    // ...
};

This will result in lighter (in terms of memory footprint) HashMap instances.

Note: using an std::vector in place of an std::array will reduce the size of HashMap significantly, but will introduce an additional heap indirection. This is the common way of implementing a similar data structure. You generally want the HashMap instance to be as lightweight as possible, so that it can be copied/moved/stored efficiently.

There's no need to use smart pointers to connect the nodes between each other, as the nodes are owned exclusively by HashMap. A raw pointer will be sufficient.

template<typename K, typename T>
class HashNode {
public:
    // ...
    HashNode* next_ptr = nullptr;
    auto& next() 
    { 
        assert(next_ptr != nullptr);
        return *next_ptr;
    }
};

The code above will work fine, assuming that the HashMap is still alive when accessing next.

Option 2 (store nodes in the map instance):

template<typename K, typename T, typename F = HashKey<K>>
class HashMap {
public:
    std::array<HashNode<K, T>, 128 > m_table;
    // ...
};

HashMap instances may be enormous, depending on the size of HashNode<K, T>.

If you choose to store the nodes directly into the HashMap with no heap indirection, you will have to use an index to access the internal array, as moving/copying the HashMap around will change the memory address of the nodes.

template<typename K, typename T>
class HashNode {
public:
    // ...
    int next_index = -1;
    auto& next(HashMap& map) 
    { 
        assert(next_index != -1);
        return map.m_table[next_index]; 
    }
};

Option 1 is exactly what I had in mind as an alternative I just wasn't sure if it was the right way to go, thank you very much! Would you care to elaborate on the term _heap indirection_ ? — dgouder, Jan 07 '16 at 17:12
@dgouder: it is not an "official" term - I used it to describe the indirection that happens when you access an object allocated on the heap through a pointer. `SomeClass x; x.some_method();` <- no indirection here. `std::unique_ptr(x); x->some_method();` <- indirection: you have to get the object pointed by `x` on the heap. This is potentially cache unfriendly as you have to jump around memory. — Vittorio Romeo, Jan 07 '16 at 17:18

Smart or raw pointers when implementing data structures?

1 Answers1