4

Using the C++17 node-extract() functions, I can change the key without having to re-allocate the node. In my particular use-case, I'm replacing the key with an equal one, so I'd like to use insert()-with-hint to avoid a full second lookup, and that works fine when using std::map:

struct Replacement { std::string from, to; };
std::map<std::string_view, Replacement> replacements;
~~~~
std::string from = ~~~;
std::string to = ~~~:
auto [it, inserted] = replacements.try_emplace(from, std::move(from), std::move(to));
if (inserted) {
    // need to patch up the key, which points to invalid data now:
    auto node = replacements.extract(it++);    // 1
    node.key() = node.mapped().from;           // 2
    replacements.insert(it, std::move(node));  // 3
}

As I insert() a new node, its key_type will continue to point to from.data(), which, however, gets moved from as part of the construction of the mapped_type in the emplacement. Because of SSO (small-string optimisation), the moved string's data() may differ from the source string's data(), invalidating the key_type as soon as from's lifetime ends, possibly sooner. So we need to re-set the key_type to point into the mapped_type, for which we (1) extract the node, making sure to keep it valid by incrementing it before the extract(), then (2) patching up the key as required and finally (3) re-inserting the node at the old position, given by the hint it.

So far so good. Now try the same with unordered_map:

~~~~
std::string from = ~~~;
std::string to = ~~~:
auto [it, inserted] = replacements.try_emplace(from, std::move(from), std::move(to));
if (inserted) {
    // need to patch up the key, which points to invalid data now:
    auto node = replacements.extract(it++);    // only thing possible, but wrong direction
    // auto node = replacements.extract(it--); // right direction, but ERROR: unordered_map isn't bidirectional
    node.key() = node.mapped().from;           // 2
    replacements.insert(it, std::move(node));  // 3
}

Here, I run into the problem that unordered_map, being a vector<forward_list>, only has forward iterators, so I cannot, as I would have to, go backwards to remember the exact insertion point for the final insert-with-hint (3) in iteration order. It seems I have but two choices:

  1. use insert(node) without a hint, causing a full second lookup
  2. increment the iterator and pass that as the hint. The std says that the insertion is performed as close as possible to the hint, but with the intra-bucket iterators, at least, being forward-only, if my hash table is as sparse as is required for good performace, the incremented iterator may be several buckets down the bucket list, making the hint useless.

So, ok, the hint is useless for unordered_map</rubberducking>. To save the question, let's talk about unordered_multimap, then :)

In unordered_map, the hint could be useful, as, unlike unordered_map, unordered_multimap needs to scan the bucket to put the new node into a potential equal_range() of the key. Is there a better way to produce a hint than post-increment or not-at-all?

Marc Mutz - mmutz
  • 24,485
  • 12
  • 80
  • 90
  • 1
    I would just say: If scanning the bucket is a performance concern then you should consider use the regular `map` but who doesn't like a good "Gedankenexperiment" right? – Superlokkus Aug 10 '21 at 08:00
  • 1
    Is passing both `from` and `std::move(from)` to the same function call defined behavior? – Enlico Aug 10 '21 at 08:00
  • With map, full lookup takes _O(log n)_ time, and providing an insertion hint can reduce it to _O(1)_. With unordered map, you have _O(1)_ time even with full lookup. It is questionable whether insertion with hint into unrodered map makes it more efficient in practice. Some benchmarking might be useful. – Daniel Langr Aug 10 '21 at 08:17
  • You're trying to save a nickel when you could save a buck by switching to a linear probing hash map implementation, such as e.g. dense_hash_map. – rustyx Aug 10 '21 at 08:18
  • 1
    BTW, with map, hint is _"iterator to the position before which the new element will be inserted"_. Shouldn't you increment `it` instead of decrement? – Daniel Langr Aug 10 '21 at 08:23
  • @Enlico: yes, it's fine, because the first `from` is implicitly converted to `string_view` before calling the `try_emplace()`, and `std::move` doesn't move (it's just a cast). The move happens inside `try_emplace()`, at a time when the node is already allocated, and, one would argue, the hash over the key has already been calculated. That said, _if_ a moved-from SSO string would be set to empty _and_ the hash was calculated _twice_, _then_ we'd have a problem. Just one of the two doesn't invalidate the pattern. But such an inefficient implementation would quickly fix these QoI issues. – Marc Mutz - mmutz Aug 11 '21 at 06:03
  • @rustyx: I believe such hash tables don't guarantee stability of reference, in which case any rehash would invalidate my keys anew. – Marc Mutz - mmutz Aug 11 '21 at 06:06
  • @DanielLangr Ah, that's "new" in C++11. Fixed, thanks. – Marc Mutz - mmutz Aug 11 '21 at 06:08
  • @MarcMutz-mmutz, sorry, but isn't the "_if_ a moved-from SSO string..." exactly the reason why it is undefined behavior, because the behavior depends on implementation details and such? – Enlico Aug 11 '21 at 07:20
  • 1
    @Enlico depending on implementation detail doesn't make code undefined behaviour. Running into undefined behaviour is undefined behaviour. In this case, if both implementation details were not in our favour, the worst that can happen is that the hash that gets stored in the node was calculated over the data of a moved-from std::string, which is in the infamous valid-but-unspecified state. So, we got a node who's key references moved-from data and a hash value that's wrong. Both are fixed by the assignment of a new key after `extract()`ing and before `insert()`ing, re-establishing invariants. – Marc Mutz - mmutz Aug 11 '21 at 10:19
  • Ok, so I've misused _undefined behavior_; I should have writte _unspecified state_. Thanks for the explanation, @MarcMutz-mmutz. – Enlico Aug 11 '21 at 10:34

1 Answers1

1

If your only aim is value-preserving key adjustments, you needn’t extract anything: just wrap your key in a class that has it as a mutable member (or holds it via a std::unique_ptr) so that you can legitimately change it in the map. You’ll have to define comparison/hashing operations for the wrapper type, but that’s no more code than the iterator trickery.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76