emplace_hint performance when hint is wrong

Question

I am trying to determine if emplace_hint should be used to insert a key into a multimap (as opposed to regular emplace). I have already calculated the range of the key in an earlier operation (on the same key):

range = multimap.equal_range(key);

Should I use range.first, range.second, or nothing as a hint to insert the key, value pair? What if the range is empty?

Good question. `emplace_hint` inserts the element before the hint, if it can. This almost makes me think you would actually need `++range.second`. — NathanOliver, Jun 04 '18 at 18:31
@NathanOliver `range.second` would already be "one past the end", no? — Drew Dormann, Jun 04 '18 at 18:32
@DrewDormann Ah yes, `second` is the first element greater than the key. With that I think `range.second` is probably guaranteed to be O(1) — NathanOliver, Jun 04 '18 at 18:33
@NathanOliver it will be _amortized_ O(1). In case of red-black-tree-based map implementation worst case complexity of any such insert is O(log N) because of auto-rebalancing — C.M., Jun 04 '18 at 18:48

score 12 · Answer 1 · edited Jun 20 '20 at 09:12

12

Should I use range.first, range.second, or nothing as a hint to insert the key, value pair?

As std::multimap::emplace_hint() states:

Inserts a new element into the container as close as possible to the position just before hint.

(emphasis is mine) you should use second iterator from range and it should make insertion more efficient:

Complexity

Logarithmic in the size of the container in general, but amortized constant if the new element is inserted just before hint.

as for empty range, it is still fine to use second iterator as it should always point to greater than element or behind the last if not such one exists.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 04 '18 at 18:33

Slava

43,454
1
47
90

Actually, in which case would `first` not working? it seems like `equal_range` could always return a valid `first` that could be used as hint – Baiyan Huang Dec 11 '20 at 06:07
@baye it works with `first` but it would not be as efficient as using `second`. `first` sometimes points to the right position, `second` always does. – Slava Dec 11 '20 at 15:37
My understanding is `first` always pointing to the right position - do you mind share in which case it is not? – Baiyan Huang Dec 11 '20 at 23:13

Stephan Lechner · Answer 2 · 2018-06-05T21:16:01.527

First, performance wise, it will not make any difference if you use range.first or range.second. Let's have a look at the return value of equal_range:

std::equal_range - return value

std::pair containing a pair of iterators defining the wanted range, the first pointing to the first element that is not less than value and the second pointing to the first element greater than value. If there are no elements not less than value, last is returned as the first element. Similarly if there are no elements greater than value, last is returned as the second element

This means that - when obtained for a value key - both range.first and range.secod are represent positions wherekeymay be correctly inserted right before. So performance wise it should not matter if you userange.firstorrange.last`. Complexity should be "amortized constant", since the new element is inserted just before hint.

Second, when the range is "empty", range.first and range.second are both one-past-the-end, and therefore performance as well as result are identical, actually the same as if you used emplace without any hint.

See the following program demonstrating this:

int main()
{
    std::multimap<std::string, std::string> m;

    // some clutter:
    m.emplace(std::make_pair(std::string("k"), std::string("1")));
    m.emplace(std::make_pair(std::string("k"), std::string("2")));
    m.emplace(std::make_pair(std::string("z"), std::string("1")));
    m.emplace(std::make_pair(std::string("z"), std::string("2")));

    // relevant portion of demo data: order a-c-b may be preserved
    m.emplace(std::make_pair(std::string("x"), std::string("a")));
    m.emplace(std::make_pair(std::string("x"), std::string("c")));
    m.emplace(std::make_pair(std::string("x"), std::string("b")));


    auto r = m.equal_range("x");
    // will insert "x.zzzz" before "x.a":
    m.emplace_hint(r.first, std::make_pair(std::string("x"), std::string("zzzz")));

    // will insert "x.0" right after "x.b":
    m.emplace_hint(r.second, std::make_pair(std::string("x"), std::string("0")));

    auto rEmpty = m.equal_range("e");
    // "empty" range, normal lookup:
    m.emplace_hint(rEmpty.first, std::make_pair(std::string("e"), std::string("b")));
    m.emplace_hint(rEmpty.second, std::make_pair(std::string("e"), std::string("a")));

    auto rWrong = m.equal_range("k");
    m.emplace_hint(rWrong.first, std::make_pair(std::string("z"), std::string("a")));

    for (const auto &p : m) {
        std::cout << p.first << " => " << p.second << '\n';
    }
}

Output:

e => b
e => a
k => 1
k => 2
x => zzzz
x => a
x => c
x => b
x => 0
z => a
z => 1
z => 2

In short: if you have a valid range for key pre-calculated, then use it when inserting key. It will help anyway.

EDIT:

There have been discussions around whether an "invalid" hint might lead to an insertion at a position that does not then reflect the "order of insertion" for values with the same key. This might be concluded from a general multimap statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".

I did not find support for the one or the other point of view in any normative document. I just found the following statement in cplusplus multimap/emplace_hint documentation:

emplate <class... Args>
  iterator emplace_hint (const_iterator position, Args&&... args);
position Hint for the position where the element can be inserted. The function optimizes its insertion time if position points to the element that will follow the inserted element (or to the end, if it would be the last). Notice that this does not force the new element to be in that position within the multimap container (the elements in a multimap always follow a specific order). const_iterator is a member type, defined as a bidirectional iterator type that points to elements.

I know that this is not a normative reference, but at least my Apple LLVM 8.0 compiler adheres to this definition (see demo above): If one inserts an element with a "wrong" hint, i.e. one pointing even before the position where a pair shall be inserted, the algorithm recognizes this and chooses a valid position (see inserting "z"=>"a" where a hint points to an "x"-element). If we use a range for key "x" and use range.first, the position right before the first x is interpreted as a valid position.

So: I think that m.emplace_hint(r.first,... behaves in a way that the algorithm chooses a valid position immediately, and that to a position close to hint overrules the general statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".

"This means that - when obtained for a value key - both range.first and range.secod are represent positions wherekeymay be correctly inserted right before." this is incorrect statement so following "So performance wise it should not matter if you use range.first or range.last" is incorrect as well — Slava, Jun 05 '18 at 00:39
@Slava: How / based on which reference do you conclude that the statements are incorrect? — Stephan Lechner, Jun 05 '18 at 07:12
From https://en.cppreference.com/w/cpp/container/multimap: "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)". So the `range.first` position is incorrect when there is at least one element in the multimap with the key used for insertion. — vedg, Jun 05 '18 at 07:17
@vedg: I think that exactly because the order of insertion does not change, inserting a pair with key X right before the range of all X will give a valid new range starting with the newly inserted pair (see example and output) — Stephan Lechner, Jun 05 '18 at 07:28
Yes, the range will be valid. However, as I understand it, the "and does not change" part does not apply to the order of insertion, but means that the order of already inserted elements never changes after any subsequent multimap manipulations. The relevant part of the cppreference quote is: "the order ... is the order of insertion", which means that inserting a new element before another element with the same key is forbidden by the multimap API promise. Perhaps I'm misunderstanding the quote from cppreference - the corresponding quote from the C++ standard is welcome. — vedg, Jun 05 '18 at 11:08
[`[associative.reqmts]`](http://eel.is/c++draft/associative.reqmts#tab:containers.associative.requirements) is confusing on this point "`a.emplace_hint(p, args)` is equivalent to `a.emplace( std::forward<Args>(args)...)`" and also "The element is inserted as close as possible to the position just prior to p." — Caleth, Jun 05 '18 at 14:36
@vedg: At least Apple LLVM does not adhere to insertion order when using emplace_hint, but it chooses valid positions with regard to equivalent keys. See edited answer. — Stephan Lechner, Jun 05 '18 at 21:19
@Caleth: At least Apple LLVM does not adhere to insertion order when using emplace_hint, but it chooses valid positions with regard to equivalent keys. See edited answer. — Stephan Lechner, Jun 05 '18 at 21:19
http://coliru.stacked-crooked.com/a/498482ef94105072 - I got the same standard output as embedded in this answer from "clang++ -std=c++1z -stdlib=libc++" and "g++ -std=c++17". The output doesn't change when I replace *emplace* and *emplace_hint* with *insert* everywhere. This empirical evidence suggests that both compilers don't care much about the order of insertion right now. — vedg, Jun 06 '18 at 08:02
When I remove all calls to *equal_range* and all hints, the order of the elements with equal keys becomes the order of insertion both with *emplace* and *insert* calls. So, curiously, in practice the hint is not just an optimization, but changes program output. — vedg, Jun 06 '18 at 08:23
@vedg: yes, seems so. I think "as close as possible" from `emplace_hint` counts more than "order of insertion" from general map description. — Stephan Lechner, Jun 06 '18 at 08:43

emplace_hint performance when hint is wrong

2 Answers2