1

I want to use Treap structure,but I don't familiar with this type of tree well.

I have two set and i want to write a method to compare them with Treap. This method should return a value that show the similarity of two set. (My work is retrieve a set that is mostly similar to an input set)

How can i do this work?

Thanks

SahelSoft
  • 615
  • 2
  • 9
  • 22

1 Answers1

2

Treap

A Treap is an example of a Balanced Binary Search Tree (you can use any of them for this problem). The expected height of a Treap containing n elements is O(logn) - expected, because Treap is a randomized data structure.

The following solution works for any Binary Search Tree, but it performs much better if the Balanced Binary Search Tree is used (e.g. Treap).

Measure

One measure of a similarity between two sets is the Jaccard Index. Let's call our sets A and B. The Jaccard Index is defined by:

enter image description here

So to compute the Jaccard Index of A and B, we must compute the sum and the intersection of A and B.

Operations

Let's assume that A and B are implemented as Balanced Binary Search Trees.

A Binary Search Tree could support many operations, but three of them are sufficient for this problem:

  • find(x) - returns true if an only if x is in the Tree
  • insert(x) - inserts x in the Tree, if x is not in the Tree before this operation
  • size() - returns the number of elements in the Tree

In the Balanced Binary Search Tree, find(x) and insert(x) have O(logn) running time, where n is the number of elements in the Tree.

In addition, during insertion, we can keep track of the size of the Tree, so size() can be implemented in a constant time.

Of course, we could iterate over all elements of our Tree.

Pseudocode

Step 1.

sum(A, B):

    C = A 

    foreach x in B:
        C.insert(x)

    return C

Step 2.

intersection(A, B):

    C = new BalancedBinarySearchTree()

    foreach x in B:
        if(A.find(x) == true):
            C.insert(x)

    return C

Step 3.

Calculate the Jaccard index of A and B:

JaccardIndex(A, B)
    S = sum(A, B)
    I = intersect(A, B)

    return I.size() / S.size()

Complexity

Let's assume that:

n = A.size()
m = B.size()

Then the complexity of computing the sum is O(n + m * log(n + m)), and the complexity of calculating the intersection is O(m * log n).

pkacprzak
  • 5,537
  • 1
  • 17
  • 37
  • Thanks for your answer, but I wanted to compare two trees based on content and the order of element (My goal of using Treap is putting elements with more weight at higher height and at the comparing step, both of content and height affect the similarity value). – SahelSoft Jun 16 '13 at 14:48
  • @roosta how do you want to include the height of an element in the similarity measure? The height of a node in a Treap is a random variable and I cannot see how you would use it. – pkacprzak Jun 16 '13 at 18:06
  • You're right, but I wanted to use a tree that concern weight and order of elements in similarity measure. I thought that was possible to use Treap. – SahelSoft Jun 16 '13 at 20:05
  • @roosta Which element do you want to have the largest weight? Maybe if you tell us what you are trying to achieve, then I or someone else could answer the question. – pkacprzak Jun 16 '13 at 22:33
  • For example main verb of a sentence have largest weight than others (and other elements in a sentence have same or different weight than others) – SahelSoft Jun 17 '13 at 10:51