So I finally figured out how to use the malt wrapper provided in the NLTK from "How to use malt parser in python nltk" and was able to to chunk my sentences successfully, but my sentences come out in a format I'm unfamiliar with.
For example, parsing "This is a sentence" returns:
>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/path/to/dir",mco="engmalt.linear-1.7",additional_java_args=['-Xmx512m'])
>>> txt = "This is a test sentence"
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
(This (sentence is a test))
Parsing a more complex sentence returns:
>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/path/to/dir",mco="engmalt.linear-1.7",additional_java_args=['-Xmx512m'])
>>> txt = "A ceasefire for east Ukraine has been agreed during talks in Minsk."
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
(agreed
(ceasefire A (for (Ukraine east)))
has
been
(during (talks (in Minsk)))
.)
Could someone please explain what this output format is or how I can parse it in such a way that makes it look like the original sentence:
(This (is a test sentence))
A (ceasefire (for (east Ukraine))) has been (agreed (during (talks (in Minsk))).)
If it helps, graph
is an nltk DependencyGraph and graph.tree()
is an nltk Tree.
Thanks in advance.