14

As a part of my academic project I need to parse a bunch of arbitrary sentences into a dependency graph. After a searching a lot I got the solution that I can use Malt Parser for parsing text with its pre trained grammer.

I have downloaded pre-trained model (engmalt.linear-1.7.mco) from http://www.maltparser.org/mco/mco.html. BUt I don't know how to parse my sentences using this grammer file and malt parser (by the python interface for malt). I have downloaded latest version of malt parser (1.7.2) and moved it to '/usr/lib/'

import nltk; 
parser =nltk.parse.malt.MaltParser()
txt="This is a test sentence"
parser.train_from_file('/home/rohith/malt-1.7.2/engmalt.linear-1.7.mco')
parser.raw_parse(txt)

after executing the last line the following eror message is dispalyed

Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
parser.raw_parse(txt)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 88, in raw_parse
return self.parse(words, verbose)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 75, in parse
return self.tagged_parse(taggedwords, verbose)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/malt.py", line 122, in tagged_parse
return DependencyGraph.load(output_file)
File "/usr/local/lib/python2.7/dist-packages/nltk-2.0b5-py2.7.egg/nltk/parse/dependencygraph.py", line 121, in load
return DependencyGraph(open(file).read())
IOError: [Errno 2] No such file or directory: '/tmp/malt_output.conll'

Please help me to parse that sentence using this malt parser.

Rohith
  • 1,301
  • 3
  • 21
  • 31
  • Note that the latest version of NLTK has this module patched up nicely, see http://stackoverflow.com/questions/33015326/maltparser-giving-error-in-nltk – alvas Oct 08 '15 at 15:09

1 Answers1

20

Edited

Note that is answer is no longer working because of the updated version of the MaltParser API in NLTK since August 2015. This answer is kept for legacy sake.

Please see this answers to get MaltParser working with NLTK:

Disclaimer: This is not an eternal solutions. The answer in the above link (posted on Feb 2016) will work for now. But when MaltParser or NLTK API changes, it might also change the syntax to using MaltParser in NLTK.


A couple problems with your setup:

  • The input to train_from_file must be a file in CoNLL format, not a pre-trained model. For an mco file, you pass it to the MaltParser constructor using the mco and working_directory parameters.
  • The default java heap allocation is not large enough to load that particular mco file, so you'll have to tell java to use more heap space with the -Xmx parameter. Unfortunately this wasn't possible with the existing code so I just checked in a change to allow an additional constructor parameters for java args. See here.

So here's what you need to do:

First, get the latest NLTK revision:

git clone https://github.com/nltk/nltk.git

(NOTE: If you can't use the git version of NLTK, then you'll have to update the file malt.py manually or copy it from here to have your own version.)

Second, rename the jar file to malt.jar, which is what NLTK expects:

cd /usr/lib/
ln -s maltparser-1.7.2.jar malt.jar

Then add an environment variable pointing to malt parser:

export MALTPARSERHOME="/Users/dhg/Downloads/maltparser-1.7.2"

Finally, load and use malt parser in python:

>>> import nltk
>>> parser = nltk.parse.malt.MaltParser(working_dir="/home/rohith/malt-1.7.2", 
...                                     mco="engmalt.linear-1.7", 
...                                     additional_java_args=['-Xmx512m'])
>>> txt = "This is a test sentence"
>>> graph = parser.raw_parse(txt)
>>> graph.tree().pprint()
'(This (sentence is a test))'
Community
  • 1
  • 1
dhg
  • 52,383
  • 8
  • 123
  • 144
  • I'm getting an exception [code] Exception: MaltParser parsing (java -Xmx512m -jar /usr/lib/malt-1.7.2/malt.jar -w /home/rohith/malt-1.7.2 -c /home/rohith/malt-1.7.2/engmalt.linear-1.7 -i /home/rohith/malt-1.7.2/malt_input.conllQh6zIp -o /home/rohith/malt-1.7.2/malt_output.conllm8yyes -m parse) failed with exit code 1 [code] – Rohith Dec 28 '12 at 16:26
  • it looks like you gave the entire path as the `mco` parameter instead of just the filename. you need to just do `mco="engmalt.linear-1.7"`. – dhg Dec 28 '12 at 16:35
  • getting the same exception in both cases. `code` Exception: MaltParser parsing (java -Xmx512m -jar /usr/lib/malt-1.7.2/malt.jar -w /home/rohith/malt-1.7.2 -c engmalt.linear-1.7 -i /home/rohith/malt-1.7.2/malt_input.conllDYGP0m -o /home/rohith/malt-1.7.2/malt_output.conllIyDCrc -m parse) failed with exit code 1 `code` – Rohith Dec 28 '12 at 16:55
  • ok, run the command that the error gives you from the terminal and see what malt parser is actually saying. (you'll have to edit `malt.py` to comment out the `os.remove` lines to so that the files don't get deleted.) – dhg Dec 28 '12 at 17:00
  • Its working now.. The problem was with my malt files. I have downloaded new files .. Now its working. Thak you... Also could you provide any informations about how to integrate stanford parser with nltk/python. There is an interface for python but I am unale to install stanford parser using command "rake setup" – Rohith Dec 28 '12 at 17:04
  • The easiest thing would probably be to just create a wrapper in the same way as `malt.py` does. So download the Stanford parser, put the jar files somewhere, and make a python class to interact with it. – dhg Dec 28 '12 at 17:13
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/21838/discussion-between-rohith-and-dhg) – Rohith Dec 28 '12 at 17:48
  • 1
    I am getting the same error as mentioned in the 1st comment in this thread. More specifically, the error is: “Exception: MaltParser parsing (java -Xmx512m -jar /usr/local/bin/malt.jar -w /home/satarupa -c engmalt.linear-1.7 -i /home/satarupa/malt_input.conlldTd4uy -o /home/satarupa/malt_output.conllcgU_Kq -m parse) failed with exit code 1” I think I have followed everything mentioned in this answer. I have cloned via git to get NLTK, so I think malt.py should be fine. – Satarupa Guha Oct 30 '14 at 10:48
  • Note that the latest version of NLTK has this module patched up nicely, see http://stackoverflow.com/questions/33015326/maltparser-giving-error-in-nltk – alvas Oct 08 '15 at 15:09
  • @dhg, sorry for the edit on your answer, I've given an updated answer to avoid multiple issues posted on the nltk github repos that uses the code snippet here and users are reporting that the code is breaking when they use the old API syntax. I hope you don't mind. – alvas Feb 28 '16 at 10:09