Stanford CoreNLP train model from text file like englishPCFG.ser.gz

Question

I am new to Stanford CoreNLP, Initially I have worked with Moses project. So far I have worked with the demo file ParserDemo2 and everything worked fine using englishPCFG.caseless.ser.gz model. I need to create my own model, from the text English monolingual corpus which I have.

So far I have searched and found that I need to create a TreeBank and use method trainFromTreebank in LexicalizedParser class.

I am really confused how to do this.

Can you provide some information or point me to the documentation on how to do so?

score 0 · Answer 1 · answered Jul 16 '15 at 14:31

0

The Stanford Parser FAQ answers: "Can I train the parser?"

It's probably easiest to start with a vanilla PCFG model and then work your way up with state-splitting, etc. to more complex models. See "Can I just use the parser as a vanilla PCFG parser?"

answered Jul 16 '15 at 14:31

Jon Gauthier

25,202
6
63
69

I am confused about how to make to normal monolingual corpus into Penn Treebank format I went to https://www.cis.upenn.edu/~treebank/ but didn’t find anything useful. – user2800040 Jul 17 '15 at 09:52
What does "monolingual corpus" mean? What does the data look like? It needs to have constituency parse annotations of some sort in place already. – Jon Gauthier Jul 17 '15 at 14:17
All I have is a large corpus of English sentences, using which I need to train the model. – user2800040 Jul 17 '15 at 15:08
You need labeled data to build a parser with Stanford models or otherwise — i.e., examples of how sentences are parsed. That means your data must come with tree annotations, in the Penn Treebank format or a similar format. – Jon Gauthier Jul 18 '15 at 05:48
Can you point me to any link on how to convert it to Penn Treebank. I am not able to find it through google search. – user2800040 Jul 18 '15 at 08:14
The data needs to be manually annotated with parse annotations. You can't build a good parser for the text otherwise. – Jon Gauthier Jul 22 '15 at 14:36

Stanford CoreNLP train model from text file like englishPCFG.ser.gz

1 Answers1