-3

I am searching for the most efficient tree search implementation in python. I give the tree search a sequence of length n and it should detect if the branches are already created, or if this is not the case, generate the branches.

Example:

i1: Sequence 1[0.89,0.43,0.28]

      0.89   check
       |
      0.43   check
       |
      0.28   check(last branch, last number of sequence == found)

i2: Sequence 2[0.89,0.43,0.99]

      0.89   check
       |
      0.43   check
       |                                           |
      0.28   missing(Creating new branch)         0.99

Considering the order within the sequences is important.

The goal is to keep track of a huge range of sequence (seen, unseen).

Has anyone ideas?

  • [heapq](https://docs.python.org/3.5/library/heapq.html) may be helpful. It works on ordered lists to implement a binary tree. – aluriak Feb 13 '17 at 16:10

1 Answers1

0

You could use an infinitely nested collections.defaultdict for this. The following function will create a defaultdict, that whenever the requested value is not present will call the same function again, creating another defaultdict, ad infinitum.

import collections
nested = lambda: collections.defaultdict(nested)
dic = nested()

Now, you can add the sequences to the nested defaultdict. You can do this in a loop, or recursively, or simply use reduce:

s1 = [0.89,0.43,0.28]
s2 = [0.89,0.43,0.99]

from functools import reduce # Python 3
reduce(lambda d, x: d[x], s1, dic)
reduce(lambda d, x: d[x], s2, dic)

Afterwards, dic looks like this: (Actually, it looks a bit different, but that's only because of defaultdict also printing the function it was created with.)

{0.89: {0.43: {0.28: {}, 0.99: {}}}}

If by "the order of the sequences is important" you mean the order in which the sequences are added, and not the order within the sequences, you will have to use a collections.OrderedDict instead. In this case, the adding of new elements is a bit more involved, but not by much.

dic = collections.OrderedDict()

def putall(d, s):
    for x in s:
        if x not in d:
            d[x] = collections.OrderedDict()
        d = d[x]

putall(dic, s1)
putall(dic, s2)
tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • Hi Tobias, nice solution. How can I see if a new defaultdict was created because of a input sequence that had new values in it? And how can I delete existing defaultdicts? – abcdef123e Feb 13 '17 at 20:53
  • @abcdef123e Using a defaultdict, you can't really find out (aside from an in-depth-comparison of the states before and after the update). But using the second method, you could easily set a `bool` flag to `True` when the `if x not in d` branch was executed and return it at the end. About deleting elements/branches: `del dic[a][b][c]` should work fine. – tobias_k Feb 13 '17 at 20:59
  • The OrderedDict solution would be very nice if it would consider the order within the sequences. I need something like this but with the ability to keep track of the order of the sequences so that the function is able to say "I've seen exactly this sequence x times before.". Does anyone has an idea on how to accomplish this? – abcdef123e Feb 13 '17 at 22:49
  • @abcdef123e It _does_ keep track of the order within the sequence. The first element is a key in the top-level dict, the second in the dict that is the value to that key, and so on. – tobias_k Feb 13 '17 at 23:05