1

I need to store unique objects in a container. The object provides a operator== and operator!= (operator< nor operator>).

I can't use std::set, as it requires a operator<. I can't use std::unordered_set as it requires a hash function and I have none. Let's say I can't write one considering my object type (or I'm lazy).

Am I really forced to use a std::vector and make sure myself that items does not get duplicated in the container (using std::find which uses operator==)?

Is there really no container that could be used to store unique items only using the operator==?

jpo38
  • 20,821
  • 10
  • 70
  • 151
  • 2
    `std::unordered_set` does not require `operator<` – Slava Aug 23 '16 at 14:04
  • [`std::unordered_set`](http://en.cppreference.com/w/cpp/container/unordered_set) does not require a comparison operator. That's the point of an "unordered" data structure – Nelfeal Aug 23 '16 at 14:04
  • 1
    As above. The only thing is to provide an hash function for the unique object. – BiagioF Aug 23 '16 at 14:05
  • 1
    http://stackoverflow.com/questions/28767234/what-container-to-store-unique-values --- DUPLICATE QUESTION – Cherkesgiller Aug 23 '16 at 14:06
  • 1
    @CherkesgillerTural: Actually, no, as it doesn't mention `operator<` at all. – MSalters Aug 23 '16 at 14:07
  • @Slava: True, my mistake, Edited the post. I'd like a container that would not require more code to be written. `operator==` should be enough to make a container store unique unordered values. – jpo38 Aug 23 '16 at 14:10
  • 2
    Then you're stuck with O(N) singularity check. – LogicStuff Aug 23 '16 at 14:11
  • Use `std::set`. After reading the docs. – juanchopanza Aug 23 '16 at 14:13
  • The problem with using `operator==` only is that you can only compare one object to another. Given the result of `operator==` there's no way of knowing how it compares to other objects without comparing it with those as well. With `operator<` you have ordering in the container and can already eliminate half (on average) of the other objects. – Kevin Aug 23 '16 at 14:13
  • I'm not concerned about performance here. As my only alternative right now is to use `std::vector` and `std::find`. – jpo38 Aug 23 '16 at 14:14
  • @Kevin: Using `operator==` isn't really a "problem". It works perfecly when using `std::find` to check if an element is not already in a `std::vector` before pushing it to guarantee there is no duplicate in the container. – jpo38 Aug 23 '16 at 14:16
  • @jpo38: that's not an alternative, if those are your requirements, that's the *solution*. – Karoly Horvath Aug 23 '16 at 14:16
  • As `std::find` requires input iterator you can use with pretty much any container, why do you think it works only with `std::vector`? – Slava Aug 23 '16 at 14:19
  • Can you write a custom comparison predicate? Sometimes you can – milleniumbug Aug 23 '16 at 14:20
  • @Slava: lack of operator< and hashing function rules out most of the containers. – Karoly Horvath Aug 23 '16 at 14:21
  • @milleniumbug: I'm not owning the library providing the objects. It would not be that easy. – jpo38 Aug 23 '16 at 14:21
  • @jpo38 and these objects don't have any externally visible state you can use for a consistent ordering or hash calculation? Wow, that's no fun. – jaggedSpire Aug 23 '16 at 14:25
  • @KarolyHorvath most? `std::find` will work on any container. Luck of hash or `operator<` will eliminate only `std::set` and `std::unordered_set`, so any other like `std::list` `std::deque` `std::forward_list` will work. – Slava Aug 23 '16 at 14:27
  • @Slava: I mean.. it eliminates the *good* choices. `vector` is the best choice left. – Karoly Horvath Aug 23 '16 at 14:35

3 Answers3

4

There's indeed no standard container, and that's because it would be inefficient. O(N), to be precise - exactly the brute force search you imagine.

Both std::set<T> and std::unordered_set<T> avoid a brute-force search by taking advantage of a non-trivial property of T. Lacking either property, any of the existing N members of a container could be equal to a potential new value V, and you must therefore compare all N members using operator== repeatedly.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Do you mean that the standard will not develop, let's call it, a "unique_vector" container only because it won't be efficient. It's true STL provides effective code but it also provides code to make developers life easier...no? – jpo38 Aug 23 '16 at 14:19
  • @jpo38: Yes, that's indeed the case. Same reason as why `std::list` doesn't have an O(N) `operator[ ]`; it could be implemented but wouldn't be fast. – MSalters Aug 23 '16 at 14:21
  • 1
    It wouldn't provide a unique_vector because `std::unordered_set` is already provided which is more efficient. If you can write an `operator==` it's a safe assumption that you can write a hash function in most cases. – Kevin Aug 23 '16 at 14:22
  • 2
    @Kevin: That's not a safe assumption, actually. Try writing a hash function that hashes `std::vector`, for instance. Keep in mind that strings can contain any `char` including `\0` and a vector can contain empty strings. Equality is trivial, in comparison. – MSalters Aug 23 '16 at 14:24
  • @MSalters: what's the problem there? – Karoly Horvath Aug 23 '16 at 14:27
  • @jpo38 they cannot create containers for every possible case. Your case is not usual and I doubt there is a value for everybody to have such container. – Slava Aug 23 '16 at 14:30
  • @KarolyHorvath: Just try to make sure that `{"ab"}, {"a","b"}, {"a","", "b"}, {"a","b", ""}` etcetera all get different hashes. Of course, you can trivially hash everything to `1`, but then you get an O(N) collection. (Which to be honest would work in this particular case) – MSalters Aug 23 '16 at 14:31
  • @MSalters: Why on earth would they get the same hashes? You're imagining a problem there. – Karoly Horvath Aug 23 '16 at 14:41
  • @MSalters there's difference between producing occasional collision (like {"a","b"} having same hash as {"a","b", ""}), and between `return 1;` hash function. If you know your data well, you can usually produce reasonably spread hash even without processing all the content, maybe even summing lengths of strings may be enough for some applications. – Ped7g Aug 23 '16 at 14:45
  • Surely adding `""` changes the hash (I'm not saying *always*, just with *sane* hashing functions). – Karoly Horvath Aug 23 '16 at 14:46
  • @KarolyHorvath if you made the hash by iterating over the vector and the vector's strings it would be examining the characters "ab" each time. Of course, if you knew that it would be storing such things, you'd at least factor in string ends to the hash, or string numbers and lengths... – jaggedSpire Aug 23 '16 at 14:47
  • @jaggedSpire: exactly. – Karoly Horvath Aug 23 '16 at 14:48
  • @KarolyHorvath I think what MSalters is trying to say is it's difficult to get a good general-purpose hash for a vector of strings, because you probably want to use only a subset of the information available to keep the hash fast, but which subset will provide the lowest collision rate would vary heavily by the characteristics of the strings in the vector. While in many cases using each character would generate fewer collisions than simply using the length of each string (at the price of a slow hash), in his example case it would be worse. – jaggedSpire Aug 23 '16 at 14:54
2

"Let's say I can't write a hash function considering my object type (or I'm lazy)."

Well, you're lazy, but I'll write one for you anyway : template<typename T> size_t degenerate_hash(T) { return 0; }.

Of course, this means you get O(N) performance because every value collides with every other value, but that was the best possible outcome anyway.

MSalters
  • 173,980
  • 10
  • 155
  • 350
1

Use a std::vector and before you std::vector::push_back or std::vector::insert use first std::find to check whether the element already exists in the vector.

Or at the end of all insertions use std::unique in combination with std::vector::erase to remove duplicates.

101010
  • 41,839
  • 11
  • 94
  • 168
  • That's what I'm actually doing right now. Was just surprised the STL did not provide a container doing this autmatically...I'll have to keep this solution then! Thanks. – jpo38 Aug 23 '16 at 14:20
  • @jpo38 I don't see any reason why STL should support a data structure like this, since there's `std::set` and `std::unordered_set` for it. IMHO it's not a big deal to provide a `operator<` for your stuff . – 101010 Aug 23 '16 at 14:27
  • `std::unique` will only work if the equivalent elements are consecutive, right? – In78 Jan 11 '22 at 14:32