1

I've implemented the CYK parsing algorithm which uses a bottoms-up approach to build a parse tree. As it traverses the algorithm, the path to the ultimate solution is stored in backpointers. From the backpointers, we construct the tree. This final step is what I am having issues with.

This is the data structure I'm using to store the tree:

class GrammarTree(object):
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

    def insertLeft(self, new_node):
        self.left = GrammarTree(new_node)

    def insertRight(self, new_node):
        self.right = GrammarTree(new_node)

The following is how I build the tree, where back stores a tuple where split is the index used to split the tree and left_rule and right_rule are the rules for the respective tree represented by int. If a leaf node is reached there is no tuple, just an int representing the terminal rule.

def build_tree(start,end,idx,back):
    tree = GrammarTree(idx)
    node = back[start][end][idx]
    if isinstance(node,tuple):
        split,left_rule,right_rule = node
        tree.insertLeft(build_tree(start,split,left_rule,back))
        tree.insertRight(build_tree(split,end,right_rule,back))
        return tree
    else:
        tree.insertLeft(GrammarTree(node))
        return tree 

The problem is that when function is done building the tree, there are extra branches being added, i.e. the nodes aren't being properly glued together.

This is what it looks like:

Lvl0                                root
                          /                            \
Lvl1                     L1                             R1
                  /       |    \             /           |       \
                 /        |     \           /            |        \
                /         |      \         /             |         \
               /          |       \       /              |          \
              /           |        \     /               |           \
Lvl2  L1.left=None L1.right=None L1.data R1.left=None R1.right=None R1.data
                                  /    \                            /    \
Lvl3                             L2     R2                         L3     R3

There shouldn't be a data node between the trees.

Edit:

The problem is not that there is an extra data node (above statement is wrong), it's that after Lvl1, instead of new branches being added to L1.left/right and R1.left/right on Lvl2, they are added to L1 and R1's data fields. So L1/R1.data ends up being a tree in and of itself and L1.left/right and R1.left/right are not used and therefore None.

It should look like this:

                  root
           /               \
          /                 \
   L1=root.left         R1=root.right
     /    \                 /   \
    /      \               /     \
   /        \             /       \
L2=L1.left  R2=L1.right L3=R1.left R3=R1.right

This is where I call build tree:

back = [[[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (1, 6, 7), 0, 3, 0, (1, 7, 7)], [0, 0, 0, (1, 6, 7), 0, (1, 7, 3), 0, (1, 7, 7)], [0, 0, 0, (1, 6, 7), 0, (2, 7, 3), 0, (1, 7, 7)]],\ 
        [[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (2, 6, 7), 0, (2, 7, 3), 0, (2, 7, 7)], [0, 0, 0, (2, 6, 7), 0, (2, 7, 3), 0, (3, 7, 7)]],\
        [[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2], [0, 0, 0, (3, 6, 7), 0, 3, 0, (3, 7, 7)]],\
        [[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 6, 0, 3, 0, 2]],\
        [[0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0]]]
build_tree(0,4,5,back)
mmera
  • 327
  • 5
  • 17
  • Please read and follow the posting guidelines in the help documentation. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. I don't see anything defined as a `Data` node, and there's no driver code to provoke the problem. – Prune Jun 07 '17 at 18:56
  • @Prune So I made some edits to reflect the actual problem I'm having. As far as posting the entire code...it's probably too large for that but I'm confident that there is enough to abstract the problem I'm having. – mmera Jun 07 '17 at 20:26
  • You post a MCVE, not your entire code. Most of us (including me) will *not* desk-check your code, nor analyze it in the abstract. We use our favourite debugging techniques to track the data and control flow. – Prune Jun 07 '17 at 20:59
  • @Prune Understood. I believe the code is to the MCVE specification now. I ran the example exactly as posted. – mmera Jun 08 '17 at 05:12
  • Your code now runs, but produces no output: there's no tracking or display data. Also, you're assuming that anyone available to help with this knows your CYK terminology and function. I'm good with trees, but I'm afraid that CYK post-dates my grad work in computational linguistics. You haven't mapped your variable names to common concepts; you haven't mapped you input to a problem (grammar); you haven't traced your output. \\\ Yes, it's *possible** for someone to help you with this, but you've left us with an overhead of perhaps half an hour just to understand the issue. – Prune Jun 08 '17 at 15:38

1 Answers1

0

The problem was in the insertLeft() and insertRight() methods of the GrammarTree class. Instead of simply connecting the branches, you'll seet that I was calling the GrammarTree constructor so I was essentially wrapping a tree inside another tree.

I fixed the problem by removing the call to the constructor.

class GrammarTree(object):
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

    def insertLeft(self, new_node):
        self.left = new_node  ## <- NOT GrammarTree(new_node) 

    def insertRight(self, new_node):
        self.right = new_node ## <- NOT GrammarTree(new_node)
mmera
  • 327
  • 5
  • 17