Why does std::pair expose member variables?

Question

From http://www.cplusplus.com/reference/utility/pair/, we know that std::pair has two member variables, first and second.

Why did the STL designers decide to expose two member variables, first and second, instead of offering a getFirst() and a getSecond()?

Don't you think `int` should also have a setter and getter ? — fjardon, Jun 15 '16 at 12:18
*"it will be better if encapsulating two member variables above and give a getFirst(); and gerSecond();"* According to whom? Why wrap something in a getter if there is no logic in the getter? — Cody Gray - on strike, Jun 15 '16 at 12:18
Contrary to popular belief, having objects that do nothing but store member variables with getters and setters is not "the way things should be done". — StoryTeller - Unslander Monica, Jun 15 '16 at 12:20
@CodyGray To be fair, pessimistically wrapping things in get/setters makes sense if they might need more complex handling later, as that can then be transparently incorporated into the methods rather than the larger work of replacing all occurrences of those variables with get/set methods on short notice... but yeah, it doesn't make sense for a single-purpose wrapper like this. This shows how a 'best practice' interpreted too broadly is the worst practice. — underscore_d, Jun 15 '16 at 12:22
A `std::pair` has one function - to provide two data items. There is no point in hiding them. — Galik, Jun 15 '16 at 12:25
Getters/Setters would be useful if you had invariants to maintain. `std::pair` has no invariants, its only purpose is to contain 2 pieces of data period. — Borgleader, Jun 15 '16 at 12:26
Actually something you may wish to consider *philosophically* is how is data being hidden (encapsulated) when you provide *getters* and *setters*? Not saying you shouldn't, just saying it raises questions about how object oriented systems are being designed and implemented. Some people argue from a pure OO perspective that you should never have a getter or a setter (nor should you expose data members as public). — Galik, Jun 15 '16 at 12:33
I don't understand downvoters, the OP is genuinely puzzled by the std::pair interface, the question is clear and narrow enough. The answer will be useful for others. — Alessandro Teruzzi, Jun 15 '16 at 12:39
@KerrekSB: You're the first person I've ever heard express that opinion. — Benjamin Lindley, Jun 15 '16 at 13:16
@AlessandroTeruzzi: The downvotes are probably due to the fact that the question only exists because the asker believes in a coding style that is generally frowned upon by many C++ programmers here. Namely "make everything private!" — Nicol Bolas, Jun 15 '16 at 13:40
@NicolBolas That's exactly my point, we need to judge the question not its premises. We should encourage OP to ask questions independently from the fact that they are based on fallacy foundation. Be a C++ programmer myself I welcome question like this one. — Alessandro Teruzzi, Jun 15 '16 at 13:50
If someone told you it's better to sit down and pray for 10 minutes that your code works before running it, would you take that at face value too? — user541686, Jun 15 '16 at 14:38
If those return a `T&` to the actual first and second... what would that accomplish, other than possibly adding overhead? — , Jun 15 '16 at 15:34
@Mehrdad your comment is suffering by "The straw man fallacy" (The Straw Man (also "The Straw Person" ""The Straw Figure"): The fallacy of setting up a phony, weak, extreme or ridiculous parody of an opponent's argument and then proceeding to knock it down with a wave of the hand.) Comparing a religious dogma with a sensible-in-most-cases software engineer guide line it is an incorrect way or arguing. — Alessandro Teruzzi, Jun 15 '16 at 15:36
Look up *quasiclass*. This knee-jerk wrapping of dada members is well critisized. — JDługosz, Jun 15 '16 at 17:08
@underscore no, it's not really beneficial to pessimistically throw getters/setters at things. That amounts to designing for hiding interface breakage. — R. Martinho Fernandes, Jun 16 '16 at 04:43
@R.MartinhoFernandes Care to explain? To me, it seems to _prevent_ "interface breakage" because the caller is dealing with the interface (methods), not its implementation (direct access to internal members), and the former can stay looking the same while the latter changes. Note also that I'm not saying get/set methods should always be used; I'm saying they _can_ have benefits in some cases for classes where they're _currently_ trivial. As mentioned, `pair` is not such a case. — underscore_d, Jun 16 '16 at 07:12
Alex Stepanov explicitly discusses the reason for keeping public member data during the second lesson of his course *Efficient Programming With Components* (see, https://youtu.be/FUMPsmKnKv8?t=895). Although it can be debated if he is right or not, this should – hopefully – count as an objective reference for the design of `std::pair`. — Ilio Catallo, Jun 16 '16 at 07:13
@fjardon from a design point of view int should have accessor because this makes you lose nothing but gain additional flexibility. This flexibility is important for many novel algorithms, e.g., those packing additional bits into keys/values and therefore would need accessors to encode/decode data. — jzl106, Apr 12 '17 at 05:15
@jzl106 And what would the getter of an `int` return ? another `int` ? So you'd have to "get" again the value returned by the getter ? — fjardon, Apr 12 '17 at 19:31
@fjard, the getter may return an int that is NOT stored as an int in the implementing class...for example, imagine an algorithm that can group ints (let's say 4-byte) that share a 3-byte-long common prefix and store them togeter. In the case of a trie, those common prefix does NOT need to be stored explicitly ( they can be implicated by the position in its containing array). Then each of the object only need to store a byte, and the getter will return the sum of the common prefix and the byte. — jzl106, Apr 13 '17 at 14:47
@fjard And the above is just one of many possible cases...in other cases you may want to pack additional bits to, let's say, the highest byte of the stored int. Those packed bits could be used internally to facilitate search or navigation, among other purposes. And you can unmask those bits in your getter. — jzl106, Apr 13 '17 at 14:59

score 96 · Answer 1 · edited Mar 14 '21 at 15:19

96

For the original C++03 std::pair, functions to access the members would serve no useful purpose.

As of C++11 and later (we're now at C++17, with C++20 coming up fast) std::pair is a special case of std::tuple, where std::tuple can have any number of items. As such it makes sense to have a parameterized getter, since it would be impractical to invent and standardize an arbitrary number of item names. Thus you can use std::get also for a std::pair.

So, the reasons for the design are historical, that the current std::pair is the end result of an evolution towards more generality.

In other news:

regarding

” As far as I know, it will be better if encapsulating two member variables above and give a getFirst(); and getSecond()

no, that's rubbish.

That's like saying a hammer is always better, whether you're driving in nails, fastening with screws, or trimming a piece of wood. Especially in the last case a hammer is just not a useful tool. Hammers can be very useful, but that doesn't mean that they're “better” in general: that's just nonsense.

edited Mar 14 '21 at 15:19

Casey

10,297
11
59
88

answered Jun 15 '16 at 12:28

Cheers and hth. - Alf

142,714
15
209
331

59

I hereby award +1 for the admonishment of a slavish devotion to java-like getters and setters of simple data members. – Richard Hodges Jun 15 '16 at 12:37
6

Using two functions would allow to benefit from empty base class optimization. That's how Boost compressed pair works. – Morwenn Jun 15 '16 at 14:53
1

Note on tuples: beyond standardizing names, it would also make variadic programming quite difficult. Iterating over an integer is easy, iterating of a list of data member names is harder... (both could cohabit, I guess?) – Matthieu M. Jun 15 '16 at 17:31
1

@Morwenn: I have not ever needed a `std::pair` with one item being of empty class type. And I can't for the life of me imagine how, if that were actually used for some obscure purpose, optimizing its size could be of any value. One would need at least an array of some million instances of pairs with one empty class item, to save enough memory to at all spot it. – Cheers and hth. - Alf Jun 15 '16 at 19:36
@Cheersandhth.-Alf It might be a useful optimisation on embedded systems, but likely not in the general case. – Justin Time - Reinstate Monica Jun 15 '16 at 20:59
1

A primary reason to not design fields into public APIs is versioning resiliency. It's not just about what the API needs right now. It might change in the future and callers will be broken. That said it's hard to see a tuple's implementation changing significantly. On the other hand getter functions have no perf impact in an optimized build. – usr Jun 15 '16 at 21:01
3

@Cheersandhth.-Alf Compressed pairs are used for example in the implementation of `unique_ptr` to get rid of the space for the deleter; similarly, they can be used to get rid of the space for allocators in containers. In containers, there can be multiple potentially stateless objects like comparison function objects etc. Vectors of `unique_ptr`s and so on might be a more realistic example of where such optimization makes sense; also considering memory-bound algorithms. Unfortunately, it seems to make it much harder for the compiler to optimize the code, e.g. for assignment. – dyp Jun 15 '16 at 21:36
@dyp: Thanks, I thought of that but dismissed it as an unreasonable multi-purposing of `std::pair`. The "pair" in "empty base optimized pair" is IMHO an implementation artifact. Which, if this were Usenet, could lead to an interesting (for me) discussion, because I don't understand the purpose of the "pair" at all: I can envision a general utility for EBO that handles only one item, namely the one that one's interested in EBO for. :) – Cheers and hth. - Alf Jun 15 '16 at 22:17
4

@Cheersandhth.-Alf in practice you can't use EBO for only one item. To do that you would have to inherit directly from the maybe-empty-class, e.g. `std::vector` would have to inherit from its `allocator_type`. That would be **BAD**. Imagine if the allocator has a virtual function `size()` ... now `std::vector::size()` overrides it. **BAD**. So in practice you make `std::vector` have a member variable of type `Impl` which is a struct that inherits from the `allocator_type` and has a single member (maybe the vector's begin pointer, or end pointer). – Jonathan Wakely Jun 15 '16 at 22:54
4

That's been a common idiom since at least 1997 (see http://www.cantrip.org/emptyopt.html which explains the problem with virtual overriders, and the solution) and a `compressed_pair` type is a simple way to make use of it. I can't for the life of me imagine how you've remained unaware of this idiom ;-) – Jonathan Wakely Jun 15 '16 at 22:55
@JonathanWakely: Thanks, I didn't think of that! So a pair is the minimum that can do the general EBO job without getting possible conflicts for virtual functions. Hm. Learned something, that's great! :) – Cheers and hth. - Alf Jun 16 '16 at 00:08
@JonathanWakely: Oh,I have to retract my new understanding. My original intuition, before the discussion, was correct: namely, there's no need for a pair to handle EBO, a single item suffices. Because it's only in the case of the item being of empty type, that one inherits, and a polymorphic type isn't empty (due to the vtable pointer). So, no trouble with inadvertent overrides. [Example at Coliru](http://coliru.stacked-crooked.com/a/5e06ba30cf6d18b1). So, learned something today too! :) Thanks for that. I guess next, learn why Boost uses a pair. Hm. – Cheers and hth. - Alf Jun 16 '16 at 23:36
_"it's only in the case of the item being of empty type, that one inherits"_ pre-C++11 that's not true, there was no way to detect if a type was empty, so you derived from it unconditionally, and needed to avoid potential problems with virtual functions. Since C++11 you can use `std::is_empty` (and since C++14 `std::is_final`, which your example doesn't account for) to decide if/when to use the EBO, but that's a relatively recent change to the idiom. – Jonathan Wakely Jun 17 '16 at 12:17
1

`boost::compressed_pair` has a copyright date of 2000. – Jonathan Wakely Jun 17 '16 at 12:24
I am sorry, but in this case, using accessors instead of exposing member variables IS ALWAYS better. With accessors you can inline to get the same performance and add behavior to achieve what member variables can never achieve. That's the flexibility a good design should offer. A common principle in lib design is that you should avoid pure data objects like std::pair, as they do nothing but mandate a particular storage structure rather than leaving that to the implementor. In the case of std:pair, a dire consequence is that all containers use it have to store data in key/pair sequence. – jzl106 Apr 12 '17 at 04:05
Hello from the World of Tomorrow! "*...`std::pair` is a special case of `std::tuple`...*" is not true anymore, (actually, I don't think it ever was). `std::pair` is not derived from `std::tuple` and is literally two members `T1 first; T2 second;` and a bunch of templated constructors for efficiency. `std::tuple` is none of that. – Casey Mar 14 '21 at 15:32

score 29 · Answer 2 · answered Jun 15 '16 at 12:30

29

Getters and setters are usually useful if one thinks that getting or setting the value requires extra logic (changing some internal state). This can then be easily added into the method. In this case std::pair is only used to provide 2 data values. Nothing more, nothing less. And thus, adding the verbosity of a getter and setter would be pointless.

answered Jun 15 '16 at 12:30

Hatted Rooster

35,759
6
62
122

This is true as far as it goes, but does **not explain** the existing getters for `std::pair`. No "extra logic" is involved. No "changing some internal state". – Cheers and hth. - Alf Jun 15 '16 at 12:39
4

@Cheersandhth.-Alf I'm talking about (non-static)member functions that are getters and setters, `std::get` is _not_ a member function of `std::pair` and thus that's invalid. – Hatted Rooster Jun 15 '16 at 12:42
distinguishing member functions from non-members is not meaningful here. indeed, stroustrup and sutter are working on a proposal to to effecively eradicate the technical differences. – Cheers and hth. - Alf Jun 15 '16 at 12:44
1

@Cheersandhth.-Alf That's just syntactic sugar for generic programming. Getters and setters are OOP concepts, and the answer here says that from an OOP point of view, member getters and setters are not needed. Note that even if the unified call syntax makes it possible to call `myPair.get<1>()`, that still won't make `std::get` a method of `std::pair`, it will just "behave" as one from an external point of view. – KABoissonneault Jun 15 '16 at 13:59
@KABoissonneault: You comment doesn't make sense, I'm sorry. Which indicates a fundamental lack of competence in this area, and/or argumentativeness. – Cheers and hth. - Alf Jun 15 '16 at 19:29
3

@Cheersandhth.-Alf I'm sorry you couldn't make any sense of my comment. However if you're interested in arguing and not just bashing on other people, you could perhaps point me out some areas you want me to expand on so we can fully understand each other – KABoissonneault Jun 15 '16 at 19:32
Well, here are some reading material suggestions that I think you'll find interesting. I think first of all, read [Scott Meyers on why non-member functions can increase encapsulation](http://www.drdobbs.com/cpp/how-non-member-functions-improve-encapsu/184401197). This goes to the *conceptual* (non-) difference. Secondly, do read [Bjarne's rational and discussion of **uniform call notation**](https://isocpp.org/blog/2016/02/a-bit-of-background-for-the-unified-call-proposal), where you can choose at will to write `f(o,x,y)` or `o.f(x, y)`, which goes to the *technical* (non-) difference. Enjoy. – Cheers and hth. - Alf Jun 15 '16 at 21:24
Not really true. If your design has functions whose behaviour is meant to change at some point, than either 1) you're just prototyping and your design isn't finished (though the purported advantages of getter/setters are diminished in prototypes) or 2) your design will break client code when the "extra logic" is added in (and the purported advantage of client code compiling without change becomes a severe disability here). In other words, your answer would make sense if it assumed the extra logic is there from the beginning, not an afterthought. – R. Martinho Fernandes Jun 16 '16 at 04:49
@R.MartinhoFernandes You're acting like the internal logic must always have some not only visible but breaking effect on the client, both of which seem like a reach to me. It might just be used for internal housekeeping, tracking, etc., in which case the client gets the same result but the class designer gains flexibility. I wish I could think of a direct example right now, but I doubt it's as bad as you're painting. – underscore_d Jun 16 '16 at 07:16

Ilio Catallo · Answer 3 · 2016-06-16T07:35:07.627

The reason is that no real invariant needs to be imposed on the data structure, as std::pair models a general-purpose container for two elements. In other words, an object of type std::pair<T, U> is assumed to be valid for any possible first and second element of type T and U, respectively. Similarly, subsequent mutations in the value of its elements cannot really affect the validity of the std::pair per se.

Alex Stepanov (the author of the STL) explicitly presents this general design principle during his course Efficient Programming with Components, when commenting on the singleton container (i.e., a container of one element).

Thus, albeit the principle in itself can be a source of debate, this is the reason behind the shape of std::pair.

Precisely! `std::pair` is just a collection of data. Therefore, it should be a `struct` with `public` members. — John, Jun 15 '16 at 17:17

davidbak · Answer 4 · 2016-06-15T22:45:33.253

9

Getters and setters are useful if one believes that abstraction is warranted to insulate users from design choices and changes in those choices, now or in the future.

The typical example for "now" is that the setter/getter might have logic to validate and/or calculate the value - e.g., use a setter for a phone number, instead of directly exposing the field, so that you can check the format; use a getter for a collection so that the getter can provide a read-only view of the member's value (a collection) to the caller.

The canonical (though bad) example for "changes in the future" is Point - should you expose x and y or getX() and getY()? The usual answer is to use getters/setters because at some time in the future you might want to change the internal representation from Cartesian to polar and you don't want your users to be impacted (or to have them depend on that design decision).

In the case of std::pair - it is the intent that this class now and forever represent two and exactly two values (of arbitrary type) directly, and provide their values on demand. That's it. And that's why the design uses direct member access, rather than go through a getter/setter.

edited Jun 15 '16 at 22:45

answered Jun 15 '16 at 22:34

davidbak

5,775
3
34
50

1

I've honestly *never* seen that justification for encapsulating access to a Point class's members. It strikes me as ridiculous on first blush. It is an extremely leaky abstraction. The consumers of that class, in its first iteration, would see that it returns Cartesian coordinates and would be completely justified in passing those to some other function that assumed Cartesian coordinates. If later changed to return polar coordinates, granted it wouldn't change interactions with other Point objects, but it would still break client code. The coordinate representation is part of the interface. – Cody Gray - on strike Jun 16 '16 at 05:06
In fact, the only logical justification I've ever seen provided for getters and setters on a Point class, and the reason I've designed a Point class that way myself, is your first "P.S." in that blog post—namely, validation. Either now, or at some time in the future, you might want to enforce certain restrictions upon the range of permissible values. I don't really understand how you can just dismiss that at the end with a "P.S.", as if that were a completely insignificant consideration in designing a public interface. – Cody Gray - on strike Jun 16 '16 at 05:09
@CodyGray - it's in at least two textbooks (older ones I have). And the reasons I dismissed it is because, in one of the textbooks, validation wasn't offered as a reason for using setters, and in both, the primary reason given was the one I stated. – davidbak Jun 16 '16 at 05:27
2

@CodyGray One of us has things completely backwards, because it looks to me like the cited change in coordinate spaces is _precisely_ insulated from the users, referring only to storage/processing internally, while users continue to see and send Cartesian coordinates exactly as they did before. – underscore_d Jun 16 '16 at 07:20
@under Oh. Maybe you are right. I definitely did not pick up on that the first time I read it, and even skimming it a second time, I don't see where there is any explicit reference to continuing to present Cartesian coordinates. I guess that makes more sense, though, and seems much more plausible (on first blush). – Cody Gray - on strike Jun 16 '16 at 07:22
1

The canonical example IMO is rectangle: do you store `{x,y,width,height`} or `{left,top,right,bottom}` ? Both are entirely reasonable, so either `setWidth` or `setRight` is a simple field accessor, but not both. – MSalters Jun 16 '16 at 09:27
@CodyGray: It's too bad there's no means of defining a class as having fields which are read-only to the outside world (but not necessarily final), or having a derived class which makes such fields writable. For points, there should be an abstract "ReadablePoint` with concrete derivatives `MutablePoint` and `ImmutablePoint`. It's useful to be able to have an object which can be passed to something for purposes of receiving an x and y, and also to have an object that can be passed to something without having to worry about whether the recipient will modify it. – supercat Jun 16 '16 at 14:45
@curiousguy: Obviously, both make `getWidth` and `getRight` return the respective values, but the two are linked by `getWidth=getRight-getLeft` ignoring any 1-off definition matters. Now `setWidth` may keep the left, right or center untouched, but that's not too relevant for this example. – MSalters Dec 27 '18 at 00:50
@MSalters "_may keep the left, right or center untouched_" so that naming convention isn't so great, is it? Or the whole idea of abstracting that way isn't great. – curiousguy Dec 27 '18 at 23:52
@curiousguy - look at the Rect classes for Windows, Android, Java as examples: they'll provide a full set of getters (or naked fields) for x, y, height, width, right, bottom but only a minimal set of setters e.g., x, y, height, width or x, y, right, bottom. That's the way the "abstracting" is handled. As far as naming conventions go … it's a convention, and usually useful. Doesn't have to be an exact match for every situation. (TBH not sure what MSalters meant by "setWidth may keep the left right or center untouched") – davidbak Dec 28 '18 at 00:25
1

@davidbak How is `setWidth` specified? What is its postcondition? – curiousguy Dec 28 '18 at 13:09
1

in cases where the rectangle is specified by top left (or bottom left, sometimes) and height width setWidth is obvious. in cases where it is specified by top left bottom right it usually isn't provided. But if it is provided, e.g., Qt, the documentation clear states how it is specified. (In the case of Qt the documentation states "The right edge is changed, but not the left one". It declares the behavior of all the setters, in fact, and setX/Y are not what I'd expect.) – davidbak Dec 28 '18 at 14:41

score 8 · Answer 5 · answered Jun 15 '16 at 16:38

It could be argued that std::pair would be better off having accessor functions to access its members! Notably for degenerated cases of std::pair there could be an advantage. For example, when at least one of the types is an empty, non-final class, the objects could be smaller (the empty base could be made a base which wouldn't need to get its own address).

At the time std::pair was invented these special cases were not considered (and I'm not sure if the empty base optimization was allowed in the draft working paper at that time). From a semantic point there isn't much reason to have accessor functions, though: clearly, the accessors would need to return a mutable reference for non-const objects. As a result the accessor does not provide any form of encapsulation.

On the other hand, it makes it [slightly] harder on the optimizer to see what's going on when accessor functions are used e.g. because additional sequence points are introduced. I could imagine that Meng Lee and Alexander Stepanov even measured whether there is a difference (nor did I). Even if they didn't, providing access to the members directly is certainly not slower than going through an accessor function while the reverse is not necessarily true.

I wasn't part of the decision and the C++ standard doesn't have a rationale but I guess it was a deliberate decision to make the members public data members.

score 4 · Answer 6 · 2016-06-16T17:42:41.873

The primary purpose of getters and setters is to gain control over access. That is to say, if you expose "first" as a variable, any class can read and write (if not const) it without telling the class it is a part of. In a number of cases, that can pose serious problems.

For example, say you have a class that represents the number of passengers on a boat. You store the number of passengers as an integer. If you expose that number as a bare variable, it would be possible for external functions to write to it. That could leave you in a case where there are actually 10 passengers, but someone changed the variable (perhaps accidentally) to be 50. This is a case for a getter on the number of passengers (but not a setter, which would present the same problem).

An example for getters and setters would be a class which represents a mathematical vector in which you want to cache certain information about the vector. Say you want to store the length. In this case, changing vec.x would probably change the length/magnitude. So, not only do you need to make x wrapped in a getter, you must provide a setter for x, which knows to update the vector's cached length. (Of course, most actual math libraries do not cache these values, and thus expose the variables.)

So the question you ought to ask yourself in the context of using them is: is this class ever conceivably going to need to control or be alerted to changes to this variable?

The answer in something like std::pair is a flat "no". There is no case for controlling access to members in a class whose sole purpose is to contain those members. There certainly is no need for pair to know if those variables have been touched, considering those are its only two members, and thus it has no state to update should either change. pair is ignorant of what it actually contains and its meaning, so tracking what it contains is not worth the effort.

Depending on the compiler and how it is configured, getters and setters can introduce overhead. That's probably not important in most cases, but if you were to put them on something fundamental like std::pair, it would be a non-trivial concern. As such, their addition would need justified - which as I just explained, it cannot be.

jzl106 · Answer 7 · 2017-04-12T20:04:50.070

I was appalled by the number of comments that show no basic understanding of object-oriented design (does that prove c++ is not an OO-language?). Yes the design of std::pair has some historical traits, but that does not make a bad design good; nor should it be used as an excuse to deny the fact. Before I rant on it, let me answer some of the questions in the comments:

Don't you think int should also have a setter and getter

Yes, from a design point of view we should use accessors because by doing so we lose nothing but gain additional flexibility. Some newer algorithms may want to pack additional bits into the key/values, and you cannot encode/decode them without accessors.

Why wrap something in a getter if there is no logic in the getter?

How do you know there would be no logic in the getter/setter? A good design should not limit the possibility of implementation based on guess. It should offer as much flexibility as possible. Remember the design of std:pair also decides the design of iterator, and by requiring users to directly access member variables, the iterator has to return structures that actually store key/values together. That turns out to be a big limitation. There are algorithms that need to keep them separate. There are algorithms that don't store key/values explicitly at all. Now they have to copy the data during iteration.

Contrary to popular belief, having objects that do nothing but store member variables with getters and setters is not "the way things should be done"

Another wild guess.

OK, I would stop here.

To answer the original question: std::pair chose to expose member variables because whoever designed it did not recognize and/or prioritize the importance of a flexible contract. They obviously had a very narrow idea/vision about how key-value pairs in a map/hashtable should be implemented, and to make it worse, they let such a narrow view on implementation spill over the top to compromise the design. For example, what if I want to implement a replacement of std:unordered_map that stores key and values in separate arrays based on an open addressing scheme with linear probing? This can greatly boost cache performance for pairs with small keys and large values, as you don't need long-jump across the spaces occupied by values to probe the keys. Had std::pair chosen accessors, it would be trivial to write an STL-style iterator for this. But now it is simply impossible to achieve this without eliciting additional data copying.

I noticed that they also mandate the use of open hashing (i.e., closed chaining) for the implementation of std::unordered_map. This is not only strange from a design point of view (why you want to restrict how things are implemented???), but also pretty dumb in terms of implementation - chained hashtables using linked list is perhaps the slowest of all categories. Go google the web, we can easily find that std:unordered_map is often the doormat of a hashtable benchmark. It even tends to be slower than Java's HashMap (I don't know how they managed to lag behind in this case, as HashMap is also a chained hashtable). An old excuse is that chained hashtable tend to perform better when the load_factor approaches 1, which is totally invalid because 1) there are plenty of techniques in open addressing family to deal with this problem - ever heard of hopscotching or robin-hood hashing, and the latter has actually been there for 30 freakish years; 2) a chained hashtable adds the overhead of a pointer (a good 8 bytes on 64 bit machines) for each entry, so when we say the load_factor of an unordered_map approaches 1, it is not 100% memory usage! We should take that into consideration and compare the performance of unordered_map with alternatives with same memory usage. And it turns out that alternatives like Google Dense HashMap is 3-4 times faster than std::unordered_map.

Why these are relevant? Because interestingly, mandating open hashing does make the design of std::pair look less bad, now that we do not need the flexibility of an alternative storage structure. Moreover, the presence of std::pair makes it almost impossible to adopt newer/better algorithms to write a drop-in replacement of std::unordered_map. Sometimes you wonder whether they did that intentionally so that the poor design of std::pair and the pedestrian implementation of std::unordered_map can survive longer together. Of course I am kidding, so whoever wrote those, don't get offended. In fact people using Java or Python (OK, I admit Python's a stretch) would want to thank you for making them feel good about being "as fast as C++".

Why does std::pair expose member variables?

7 Answers7

Linked