0

I need to aggregate some tree into cluster of "similar" tree, but actually i do not know how to define distance between two different tree. For the clustering algorith, my first bet is on k-mean but i am not sure about my choice.

I need to evaluate both topological difference (between trees) and data distance (each node contain a value so two trees that have the same structure can have different values, so they are considered different).

My question is very close to that : Clustering tree structured data

But i do not want to cluster stack trace but a real tree, what i am not able to do is to write a distance function that take in account both layout and content of each node. I am not asking which distance function is good for my scenario, but which is the right pattern to address that goal.

Community
  • 1
  • 1
Skary
  • 1,322
  • 1
  • 13
  • 40
  • k-means needs to compute *means*. How would you compute the mean of two trees? I'd say k-means is your *last* bet... also, are you sure you are talking about [*real* trees](https://en.wikipedia.org/wiki/Tree)? ;-) seriously: you need a distance to quantify *your* applications similarity. – Has QUIT--Anony-Mousse Sep 28 '15 at 06:07
  • i have trees that represents hierarchical organization of a structure. I need to compare how two structures are organized, to see if these two structures are similar or not (then decide to cluster or not). But really i have not idea on which is the best approach i had supposed clustering and tree distances, but i am open on suggestion – Skary Sep 28 '15 at 07:48
  • It really depends on your data. There is no general solution for quantifying tree similarity. What works for e.g. XML document trees may be totally nonsense on your data. For example, XML document trees have an *order* among the children. – Has QUIT--Anony-Mousse Sep 28 '15 at 09:29
  • can you provide me some example or idea on how to manage xml (is not my case but may help to understand the situation). In my scenario the tree is ordered, each children (and root) has a kind and a value. What i would achieve is understand how to write a distance function, or if there is some distance function that work fine on that general case. For example i can flatten the tree and then use Levenshtein distance between lists – Skary Sep 28 '15 at 10:09
  • No: I don't use XML, and I have never felt the urge to measure similarity of trees. This is something you really need to figure out yourself, sorry about that. – Has QUIT--Anony-Mousse Sep 28 '15 at 12:28

0 Answers0