I have a string with a sentence parse and want to extract/parse from the string that are contained within the opening and closing brackets. The catch is that there are other brackets of the same type (parenthesis in this case) that also need to be grabbed. So basically I need to have the correct number of open braces associated with NP
equal to the same number of closing braces.
In this example:
x <- "(TOP (S (NP (NNP Clifford)) (NP (DT the) (JJ big) (JJ red) (NN dog)) (VP (VBD ate) (NP (PRP$ my) (NN lunch)))(. .)))"
Let's say I want to extract the noun phrases (NP
) into the three substrings below:
(NP (NNP Clifford))
(NP (DT the) (JJ big) (JJ red) (NN dog))
(NP (PRP$ my) (NN lunch))
This would then be generalizable to all parts of the string, say I wanted to grab the VP
brackets, I could follow the same logic.