4

I searched the forums but I couldn't figure it out if the other similar questions are necessarily related to this one.

What I'm trying to do is match a subtree to a tree of objects.

I know there are pattern matching algorithms based on suffix trees or automata, but I'm not sure whether they apply here.

Example

I am trying to match the subtree given by the red nodes in the picture against the bigger tree, regardless of the overall structure of the tree or whether the red nodes have children or not.

The reason straightforward pattern matching does not work is that no ordering of the nodes (post/preorder, breadth) will be usable.

So I'm thinking of writing a recursive algorithm that starts from the root of the subtree and tries to match nodes and then their children.

I was wondering if there exists any such (efficient algorithm). Apologies if this has already been asked.

Bogdan B
  • 485
  • 5
  • 15
  • 3
    It's still not clear what exactly you're trying to do. Perhaps you could add example inputs and expected outputs? – Antimony Jul 07 '12 at 16:23
  • Nodes in the tree are objects, let's say A, B, C... They can be compared based on their attributes so we have ordering and equality. Expected input is a fragment like the red one above and a complete tree. Expected output is the positions in which the fragment is found in the tree. – Bogdan B Jul 07 '12 at 17:49

2 Answers2

3

Here are a few references that may help you:

Aho's Efficient Tree Pattern Matching: an Aid to Code Generation

Stanford's Tree pattern matching programs

From IMECS 2011: Tree Pattern Matching Algorithm Using a Succinct Data Structure

Doug Currie
  • 40,708
  • 1
  • 95
  • 119
3

It appears that what I was looking for was an algorithm to solve the "tree inclusion problem". I have found some useful documents:

Fast Algorithms for Finding Nearest Common Ancestors

The Tree Inclusion Problem: In Optimal Space and Faster

Tree Isomorphism And Related Problems

I translated one of the algorithms from the last paper into C# (returns the number of pairs in a largest matching between first-level subtrees of a and b).

public static int Match(Node a, Node b, NodeSimilarityComparer comp) {
  if (!comp.Equals(a, b))
    return 0;
  int m = a.SubtreeCount;
  int n = b.SubtreeCount;
  var matrix = new int[m + 1, n + 1];

  for (int i = 0; i <= m; ++i)
    matrix[i, 0] = 0;

  for (int j = 0; j <= n; ++j)
    matrix[0, j] = 0;

  for (int i = 1; i <= m; ++i)
    for (int j = 1; j <= n; ++j) {
      var ai = a.GetSubtree(i - 1);
      var bj = b.GetSubtree(j - 1);
      var match = Match(ai, bj, comp);
      matrix[i, j] = Math.Max(Math.Max(matrix[i, j - 1], matrix[i - 1, j]), matrix[i - 1, j - 1] + match);
    }
  return matrix[m, n] + 1;
}

Hope this helps others too.

Bogdan B
  • 485
  • 5
  • 15