How is C++'s std::set class able to implement a binary tree for ANY type of data structure?

Question

I understand how binary trees are implemented for most native elements such as ints or strings. So I can understand an implementation of std::set that would have a constructor like

switch(typeof(T)) // T being the typename/class in the implementation 
{
  case int: 
  {
      /* create a binary tree structure that uses the bitshift operator to 
         add elements, e.g. 13=1101 is created as
                                      /
                                     /
                                    /
                                   /
                                  1
                                 /
                                /
                               /
                              1
                               \
                                \
                                 0
                                /
                               1
      */
  }
  case string: 
  {
      /* Do something where the string is added to a tree by going letter-by-letter 
         and looking whether the letter is in the second half of the alphabet (?)
      */
  }
  // etcetera for every imaginable type
}

but obviously this is not how std::set is actually implemented, because it is able to create a tree even when I use a homemade data structure like

struct myStruct
{
      char c; 
      bool b;
};

std::set<myStruct> mySet;

Could it be possible to create a generic binary tree class that looks at all the bits of a data structure and does something like the int case I mentioned above?

For instance, in the case of myStruct, the size of the structure is 2 bytes of 16 bits, so a myStruct element S with S.c = '!' and S.b = true could look like

00010101 00000001
(c part) (b part)

= 

                             \
                              \
                               0
                                \
                                 \
                                  0
                                   \
                                    \
                                     0
                                    /
                                   /
                                  1
                                   \
                                 [etcetera]

since the ASCII value for '!' is 21 and a bool = true as an int is 1. Then again, this could be inefficient to do generically because a very large data structure would correspond to a gigantic tree that might take more time to traverse then just doing a basic linear search on the elements.

Does that make sense? I'm truly confused an would love if some people here could set me straight.

have you **tried** your `myStruct` example. if that worked, which compiler did you use. — Cheers and hth. - Alf, Apr 24 '14 at 17:18
I don't think you understand how balanced binary trees work. Adding only `13` in an `std::set` would never give the path you describe! Sounds like you're describing a [trie](http://en.wikipedia.org/wiki/Trie) (prefix tree). — André Caron, Apr 24 '14 at 17:25
Any C++ compiler will compile the `myStruct` snippet posted - however it'll fail miserably once you try to actually insert an object into the set since there's no way to get an ordering for two `myStruct` objects. — Michael Burr, Apr 24 '14 at 17:27

score 6 · Accepted Answer · edited May 23 '17 at 10:32

6

What you want is a good book on templates and template meta-programming.

In short, the std::set class only defines a prototype for a class, which is then instantiated at compile-type using the provided arguments (some Key-type Key, some value-type T, which deduces std::less<Key> and std::allocator<std::pair<Key, T>> if not given, or whatever else).

A big part of the flexibility comes from being able to create partial specialisations and using other templates and default arguments.

Now, std::less is defined for many standard-library types and all basic types, but not for custom types.

There are 3 ways to provide the comparison std::map needs:

Override the default template argument and provide it to the template (if the override has state, it might make sense to provide an object to the constructor).
Specialise std::less.
Add a comparison operator (operator<).

edited May 23 '17 at 10:32

Community

1
1

answered Apr 24 '14 at 17:19

Deduplicator

44,692
7
66
118

Somehow I don't feel this addresses the question at all, since it does not resolve OP's confusion about self-balancing binary search trees vs. tries – Niklas B. Apr 24 '14 at 19:37
@NiklasB. That was not the question, only a sideshow. – Deduplicator Apr 24 '14 at 19:39
There are two question marks: "Could it be possible to create a generic binary tree class that [...] does something like the int case I mentioned above?" and "Does that make sense?". There is no question about how to use `set`, but maybe I'm just misinterpreting the original post. – Niklas B. Apr 24 '14 at 19:41
Just about everyone (including the OP) regarded that only as something to apply the question to. (Questions should not contain a second unrelated one anyway). – Deduplicator Apr 24 '14 at 19:49

score 4 · Answer 2 · answered Apr 24 '14 at 17:25

Let's try out your example:

#include <set>

struct myStruct {
    char c;
    bool b;
};

int main() {
    std::set<myStruct> mySet;
    mySet.insert(myStruct());
}

If we compile this, we actually get an error. I've reduced the error messages to the interesting part and we see:

.../__functional_base:63:21: error: invalid operands to binary expression ('const myStruct' and 'const myStruct')
    {return __x < __y;}

We can see here that std::set, to do the work it needs to do, needs to be able to compare these two objects against each other. Let's implement that:

bool operator<(myStruct const & lhs, myStruct const & rhs) {
    if (lhs.c < rhs.c)
        return true;
    if (lhs.c > rhs.c)
        return false;
    return lhs.b < rhs.b;
}

Now the code will compile fine.

All of this works because std::set<T> expects to be able to compare two T objects via std::less<T> which attempts to do (T) lhs < (T) rhs.

Thanatos · Answer 3 · 2014-04-24T17:35:28.247

This is highly implementation specific: actual implementations can vary here. I hope to just give you an idea of how it works.

A binary tree typically will hold actual values at each spot in the tree: your diagram makes me think the values are only present at leaf nodes (are you thinking of a trie?). Consider a string binary tree, with memebers cat, duck, goose, and dog:

   dog
  /   \
cat   duck
         \
         goose

Note here that each node is a value that exists in the set. (Here, our set has 4 elements.) While perhaps you could do some sort of 0/1 prefix, you'd need to be able to convert the object to a bitstring (looking at the raw underlying bytes is not guaranteed to work), and isn't really needed.

You need to understand templates in C++; Remeber that a set<T> is "templated" on T, that is, T is whatever you specify when you use a set. (a string (set<string>, your custom struct (set<MyStruct>), etc.) Inside the implementation of set, you might imagine a helper class like:

template<typename T>
struct node {
    T value;
    node<T> *left, *right;
}

This structure holds a value and which node is to the left and right of it. set<T>, because it has T to use in it's implementation, can use that to also template this node structure to the same T. In my example, the bit labeled "dog" would be a node, with value being a std::string with the value "dog", left pointing to the node holding "cat", and right pointing to the node holding "duck".

When you look up a value in a set, it looks through the tree for the value, starting at the root. The set can "know" which way to go (left or right) by comparing the value you're looking for / inserting / removing with the node it's looking at. (One of the requirements for a set is that whatever you template it on be comparable with <, or you give it a function to act in place of <: so, int works, std::string works, and MyStruct can work if you either define < or write a "comparator".)

laune · Answer 4 · 2014-04-24T17:40:23.177

-2

You can always compare two of a kind by comparing their byte array, no matter what.

So, if the set is represented as a sorted binary tree, a memcmp with result -1 indicates insert left, and one with +1 says, insert right.

Later

I was so eager to show that there's no need to branch according to the bits of a set element that I did not consider that there's a restriction that requires a std::set element to implement operator<.

Am I forgiven?

edited Apr 24 '14 at 17:40

answered Apr 24 '14 at 17:21

laune

31,114
3
29
42

1

Yes, you *can* compare two objects byte by byte - but that will hardly ever give you what you want, which is why C++ won't default to that behavior. If you haven't defined an `operator<` or specialization of `std::less`, you get a compile error. – Mark Ransom Apr 24 '14 at 17:28
Suppose you're trying to make a binary tree for character arrays of length 50. That's 3200 "steps" to lookup whether a character array is in the tree. Each of those steps involves a couple operations to check whether to proceed "right" or "left". So it's more like 10,000 steps. And of course you have to jump from one memory location to another. It probably wouldn't be as fast as just making a giant array to hold all the length-50 character arrays. That's why I don't get the obsession with programming interviews at Amazon/Google/etc. testing knowledge of binary trees and stuff like that. – user3566398 Apr 24 '14 at 17:30
@MarkRansom It's still a way to decide what goes left and what goes right, and is far from what OP indicates as a way to create a binary tree. And I didn't say that C++ does it this way. – laune Apr 24 '14 at 17:32
... and the question was 'How is C++'s std::set class able..." – Commander Coriander Salamander Apr 24 '14 at 17:34
@user3566398 I beg your pardon? You are assuming, out of the blue, that a binary tree must be constructed according to the bit list of the node values, which isn't how it is done. Simply deciding what is less, equal or greater is sufficient. – laune Apr 24 '14 at 17:35
@CommanderCorianderSalamander OK, OK - so this bit string fantasy distracted my attention. – laune Apr 24 '14 at 17:37
@MarkRansom: There's a third method to provide the comparison, see my answer. – Deduplicator Apr 24 '14 at 17:38
@laune How does one come up with a generic way of creating an order relation on a data structure? That's what I'm confused about. – user3566398 Apr 24 '14 at 17:38
@user3566398 You typically implement `operator<` as a lexicographical comparison for your particular data structure. – Oktalist Apr 24 '14 at 17:42
@user3566398 As others have said, a binary tree doesn't work like that. You are thinking of a _trie_. A binary tree for `char[50]` doesn't contain 3200 of anything. It contains nothing, until you insert an element, then it contains one element. Insert another element, then it contains two elements... Arranged in a tree. – Oktalist Apr 24 '14 at 17:45
@user3566398: No need to be generic there, my answer should explain that. – Deduplicator Apr 24 '14 at 17:46

How is C++'s std::set class able to implement a binary tree for ANY type of data structure?

4 Answers4