0

I am new to Stanford CoreNLP, Initially I have worked with Moses project. So far I have worked with the demo file ParserDemo2 and everything worked fine using englishPCFG.caseless.ser.gz model. I need to create my own model, from the text English monolingual corpus which I have.

So far I have searched and found that I need to create a TreeBank and use method trainFromTreebank in LexicalizedParser class.

I am really confused how to do this.

Can you provide some information or point me to the documentation on how to do so?

user2800040
  • 143
  • 2
  • 13

1 Answers1

0

The Stanford Parser FAQ answers: "Can I train the parser?"

It's probably easiest to start with a vanilla PCFG model and then work your way up with state-splitting, etc. to more complex models. See "Can I just use the parser as a vanilla PCFG parser?"

Jon Gauthier
  • 25,202
  • 6
  • 63
  • 69
  • I am confused about how to make to normal monolingual corpus into Penn Treebank format I went to https://www.cis.upenn.edu/~treebank/ but didn’t find anything useful. – user2800040 Jul 17 '15 at 09:52
  • What does "monolingual corpus" mean? What does the data look like? It needs to have constituency parse annotations of some sort in place already. – Jon Gauthier Jul 17 '15 at 14:17
  • All I have is a large corpus of English sentences, using which I need to train the model. – user2800040 Jul 17 '15 at 15:08
  • You need labeled data to build a parser with Stanford models or otherwise — i.e., examples of how sentences are parsed. That means your data must come with tree annotations, in the Penn Treebank format or a similar format. – Jon Gauthier Jul 18 '15 at 05:48
  • Can you point me to any link on how to convert it to Penn Treebank. I am not able to find it through google search. – user2800040 Jul 18 '15 at 08:14
  • The data needs to be manually annotated with parse annotations. You can't build a good parser for the text otherwise. – Jon Gauthier Jul 22 '15 at 14:36