3

I'm trying to get the Stanford parser to work for my pipeline for German text, but it refuses to take the German parser:

Properties props = new Properties();

props.put("annotators", "tokenize, ssplit, pos, parse");
props.put("ssplit.isOneSentence", "true");
props.put("pos.model", "pos-taggers/german-fast/german-fast.tagger");
props.put("pos.maxlen", "30");
props.put("parse.model", "edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz");
props.put("encoding", "utf-8");

pipeline = new StanfordCoreNLP(props);

I still get the following output and nothing more because German tags are not recognized:

Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
Initializing lexicon scores ... The 15 open class tags are: [ TRUNC NE NN XY VVIZU ADV VVINF VVFIN VVPP CARD NN-OA ADJA FM ADJD NN-SB ] 

The failure trace:

java.lang.IllegalArgumentException: Unknown option: -retainTmpSubcategories
at edu.stanford.nlp.parser.lexparser.Options.setOption(Options.java:175)
at edu.stanford.nlp.parser.lexparser.Options.setOptions(Options.java:68)
at edu.stanford.nlp.parser.lexparser.Options.setOptions(Options.java:49)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.setOptionFlags(LexicalizedParser.java:841)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:159)
at edu.stanford.nlp.parser.lexparser.LexicalizedParser.loadModel(LexicalizedParser.java:143)
at edu.stanford.nlp.pipeline.ParserAnnotator.loadModel(ParserAnnotator.java:176)
at edu.stanford.nlp.pipeline.ParserAnnotator.<init>(ParserAnnotator.java:106)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$12.create(StanfordCoreNLP.java:734)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:81)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:261)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:127)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:123)
at da.utils.nlp.SentimentExtractor.initPipeline(SentimentExtractor.java:111)
at da.utils.nlp.SentimentExtractor.coreAnnotate(SentimentExtractor.java:117)
at da.utils.nlp.SentimentExtractorTest.testCoreAnnotate(SentimentExtractorTest.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

Any idea what may be wrong in my implementation?

I checked the file location with no success.

ATN
  • 665
  • 8
  • 26
  • Crashes with what error/ – kosa Oct 17 '13 at 02:35
  • I added the failure trace, sorry for the delay – ATN Oct 17 '13 at 02:51
  • It seems parser looking for this flag, -retainTmpSubcategories. You may refer documentation and see any mention of this. – kosa Oct 17 '13 at 04:05
  • Hey there, may I ask where you got the germanPCFG.ser.gz file from? Have you trained your own model? I'm kinda desperate having a similar problem -> http://stackoverflow.com/questions/19531208/how-to-use-stanford-corenlp-with-a-non-english-parse-model – David Müller Oct 23 '13 at 11:23

1 Answers1

2

The simple (if confusing) answer should be that you just need to add this line in your Properties setup:

props.put("parse.flags", "");

(This should be fixed, but the flags default to an option that is useful when getting out English dependencies, but not relevant or available in other languages, hence your getting the error message above.)

HOWEVER, if this were the only problem, you should first see it loading the German parser before giving the long error dump like this:

Adding annotator parse
Loading parser from serialized file edu/stanford/nlp/models/lexparser/germanFactored.ser.gz ... done [5.2 sec].
Exception in thread "main" java.lang.IllegalArgumentException: Unknown option: -retainTmpSubcategories

But in the output you show, it is still loading an English parser. So something else must be wrong. I'm not sure about this part, but two possibilities are:

  • You're running an old version of Stanford CoreNLP. A while back, the options were called "parser.model", "parser.flags", etc., but we renamed them for consistency.
  • You don't have a resource called edu/stanford/nlp/models/lexparser/germanPCFG.ser.gz on your CLASSPATH
Christopher Manning
  • 9,360
  • 34
  • 46