Set vs. Multiset

Question

I'd like to store (about 100 to 1000) different relations of type <Object A, Relation R, Object B> in a set or multiset. I'd like to be able to search for A and (A,R), but not for (A,R,B) (and there will be only a few (<5) relations with the same A and R, so linear search if fine then).

Is it better to store the relations in a set (ordered by A, R and B) or to store them in a multiset ordered by A and R?

Edit: I've looked into hash tables, but their iteration isn't as fast as (ordered) set iteration, and the pattern matching requires a lot of iteration too. (It will have to search once to find the start of the iteration and then iterate until all relations with the same object A are done.)

Thanks, Ragnar

How about storing them in a vector? For 1000 elements, my money is on that being the fastest implementation. — Kerrek SB, Aug 25 '13 at 19:43
The program is going to search the set/vector quite often, because it has to do a lot of pattern matching on different relations (the program is a geometry problem solver, and it must find situations where it can apply a certain theorem) — Ragnar, Aug 25 '13 at 19:48
@Ragnar: The question isn't really how often the searching has to be done but rather when the complex structure of the maps pays off against the simpler structure of the vectors. The number of elements before it pays off to use a map tends to be much higher than what people expect. — Dietmar Kühl, Aug 25 '13 at 20:02

cmaster - reinstate monica · Answer 1 · 2013-08-25T20:10:49.990

0

From the comments I gather that you have to lookup exact matches, either for A or the pair (A, R).

If this is correct, the best thing you can do is using two hash tables: one with A as keys, the other with (A, R) as keys. The relations themselves can be stored in an unsorted vector, with indices to them inserted into the two hash tables. This is the only way you will get O(1) complexity for your lookup task.

It is crucial to the performance that you have two independent index structures: If you only use A as keys, you will get lists of objects which you have to search for a suitable R when looking for a (A, R) pair. In the opposite case, if you have only (A, R) keys, you will not be able to look only for a certain A in O(1) time.

edited Aug 25 '13 at 20:10

answered Aug 25 '13 at 20:03

cmaster - reinstate monica

38,891
9
62
106

I have never used hash tables before, but it might pay off to learn something about them, becauce O(1) lookup would probably speed up the program. – Ragnar Aug 25 '13 at 20:12
Isn't it possible to use the (A,R) key only and sort by A first, and then ascending by (the enum) R and search lowerbounds for (A,0) and (A,255)? – Ragnar Aug 25 '13 at 20:19
No, it would not be possible in O(1) time, because you would have to be able to look up a position for a (A,R) pair that does not exist. And that cannot be done with hash tables. It would be possible with other index structures like binary trees or binary search in a sorted array, but that would imply O(log N) time complexity. – cmaster - reinstate monica Aug 25 '13 at 20:21

Set vs. Multiset

1 Answers1