Map, pair-vector or two vectors...?

Question

I read through some posts and "wikis" but still cannot decide what approach is suitable for my problem.

I create a class called Sample which contains a certain number of compounds (lets say this is another class Nuclide) at a certain relative quantity (double).

Thus, something like (pseudo):

class Sample {
    map<Nuclide, double>;
}

If I had the nuclides Ba-133, Co-60 and Cs-137 in the sample, I would have to use exactly those names in code to access those nuclides in the map. However, the only thing I need to do, is to iterate through the map to perform calculations (which nuclides they are is of no interest), thus, I will use a for- loop. I want to iterate without paying any attention to the key-names, thus, I would need to use an iterator for the map, am I right?

An alternative would be a vector<pair<Nuclide, double> >

class Sample {
    vector<pair<Nuclide, double> >;
}

or simply two independent vectors

Class Sample {
    vector<Nuclide>;
    vector<double>;
}

while in the last option the link between a nuclide and its quantity would be "meta-information", given by the position in the respective vector only.

Due to my lack of profound experience, I'd ask kindly for suggestions of what approach to choose. I want to have the iteration through all available compounds to be fast and easy and at the same time keep the logical structure of the corresponding keys and values.

PS.: It's possible that the number of compunds in a sample is very low (1 to 5)! PPS.: Could the last option be modified by some const statements to prevent changes and thus keep the correct order?

Either of the first two are good. I imagine the first will provide you a more useful interface (e.g. access by key). I'd only start looking further if you start measuring performance issues. — Joseph Mansfield, Dec 31 '14 at 10:53
Use [`boost::container::flat_map<>`](http://www.boost.org/doc/libs/release/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx) to get the best of both semantics and performance. — ildjarn, Dec 31 '14 at 10:53
If you have fixed number of Nuclide's you can turn them into enum and use double[NUCLIDES_COUNT]. More memory is used, but smaller access time. — lonewasp, Dec 31 '14 at 10:58
Are you doing any look-ups in the structure based on a `Nuclide`? Also, from you description it isn't clear if both the `Nuclide` and the `double` is used for the computation or only the `double`. Finally, does the order of elements in the sequence matter (other than that `Nuclide` and the `double` go together)? — Dietmar Kühl, Dec 31 '14 at 11:00
@DietmarKühl Primarily it will be properties of the `Nuclides` to be used in calculations, actually. However, the calculations will change over time, different calculations will be performed - this is unfortunately not clear at the moment, thus I need maximum flexibility. (All this is part of a simulation that will be refined over time.) I want the `Sample` class to provide well structured information which can then be accessed in an easy manner, mostly for calculations in for-loops... The order does not matter. — LCsa, Dec 31 '14 at 11:06
I'd go for vector of pairs. Flexible and fast if sorted. map is more if you do a lot of inserts or removals. — user2672165, Dec 31 '14 at 15:39
Do I understand correctly that the key of a map is always adressed by the name I chose for the key variable (like for example `Ba-133` or `Co-60`) and not by an index (like `0` and `1`) as it would be in a vector? (It's important for me that the elements _have_ a name, but those names have to be _irrelevant_ during coding! (Name is important for an `About()`function, for instance, but computing has to be done for all key-val pairs in the same manner, no matter what.) — LCsa, Jan 02 '15 at 12:48

Dietmar Kühl · Accepted Answer · 2014-12-31T12:07:50.547

If iteration needs to be fast, you don't want std::map<...>: its iteration is a tree-walk which quickly gets bad. std::map<...> is really only reasonable if you have many mutations to the sequence and you need the sequence ordered by the key. If you have mutations but you don't care about the order std::unordered_map<...> is generally a better alternative. Both kinds of maps assume you are looking things up by key, though. From your description I don't really see that to be the case.

std::vector<...> is fast to iterated. It isn't ideal for look-ups, though. If you keep it ordered you can use std::lower_bound() to do a std::map<...>-like look-up (i.e., the complexity is also O(log n)) but the effort of keeping it sorted may make that option too expensive. However, it is an ideal container for keeping a bunch objects together which are iterated.

Whether you want one std::vector<std::pair<...>> or rather two std::vector<...>s depends on your what how the elements are accessed: if both parts of an element are bound to be accessed together, you want a std::vector<std::pair<...>> as that keeps data which is accessed together. On the other hand, if you normally only access one of the two components, using two separate std::vector<...>s will make the iteration faster as more iteration elements fit into a cache-line, especially if they are reasonably small like doubles.

In any case, I'd recommend to not expose the external structure to the outside world and rather provide an interface which lets you change the underlying representation later. That is, to achieve maximum flexibility you don't want to bake the representation into all your code. For example, if you use accessor function objects (property maps in terms of BGL or projections in terms of Eric Niebler's Range Proposal) to access the elements based on an iterator, rather than accessing the elements you can change the internal layout without having to touch any of the algorithms (you'll need to recompile the code, though):

// version using std::vector<std::pair<Nuclide, double> >
// - it would just use std::vector<std::pair<Nuclide, double>::iterator as iterator
auto nuclide_projection = [](Sample::key& key) -> Nuclide& {
    return key.first;
}
auto value_projecton = [](Sample::key& key) -> double {
    return key.second;
}

// version using two std::vectors:
// - it would use an iterator interface to an integer, yielding a std::size_t for *it
struct nuclide_projector {
    std::vector<Nuclide>& nuclides;
    auto operator()(std::size_t index) -> Nuclide& { return nuclides[index]; }
};
constexpr nuclide_projector nuclide_projection;
struct value_projector {
    std::vector<double>& values;
    auto operator()(std::size_t index) -> double& { return values[index]; }
};
constexpr value_projector value_projection;

With one pair these in-place, for example an algorithm simply running over them and printing them could look like this:

template <typename Iterator>
void print(std::ostream& out, Iterator begin, Iterator end) {
    for (; begin != end; ++begin) {
         out << "nuclide=" << nuclide_projection(*begin) << ' '
             << "value=" << value_projection(*begin) << '\n';
    }
}

Both representations are entirely different but the algorithm accessing them is entirely independent. This way it is also easy to try different representations: only the representation and the glue to the algorithms accessing it need to be changed.

Wow, something like your example code I've never seen before (I'm a physics student, so my experience with high-level-C++ is limited :P). I do not know about this `auto something = [](someref&) -> type {return...}`, what is happening... Your approach seems to be really good, however for my situation here a tad too much (at least for my level of knowledge...). But explaining the line(s) I quoted, could help me. Plus, I don't see how your algorithm example performs the same action with all elements between `begin` and `end` (I can't see the "loop")... Sorry for any inconvenience. — LCsa, Dec 31 '14 at 12:01
@ LCsa: the loop is simple: I forgot it :) It is added now. The `[](...)...` stuff is just a lambda function, i.e., a quick way to write a function object. The `auto` simply deduces its type (it isn't possible to write the type of a lambda expression). — Dietmar Kühl, Dec 31 '14 at 12:11
Ohhhhkay, that soothes me... in simpler words: the `Sample` class should provide "get" functions? — LCsa, Dec 31 '14 at 12:26

Map, pair-vector or two vectors...?

1 Answers1