0

I'm trying to use Stanford CoreNLP POS tagger on my data.

I have used the automatic generated prop file. I have only changed the Open classes.

I want to know if there is a complete description about other fields in this property like the "arch" and it possible values, "closedClassTagThreshold", "minFeatureThresh", "curWordMinFeatureThresh", "rareWordMinFeatureThresh", ...

When I run the code to tag a text,It chooses the tag which has the minimus amount in the training data set. To make it more clear, travel is tagged as a /verb/ 10 times, but as a /noun/ 20 times. It always chooses the tag which has been repeated less.

Hedieh
  • 43
  • 8
  • Can you describe the training data you are using? Can you show some examples of the sentences where this word is tagged in the training data and then show what example in the test it is getting incorrect? Also it would be helpful if you posted the property file you are using and the command you are using to call the POS tagger. – StanfordNLPHelp May 15 '16 at 04:06
  • @StanfordNLPHelp Thanks for the response. I edited my question and added some needed info. – Hedieh May 15 '16 at 07:02
  • If you set tokenize=false, it says it's expecting each line to be a sentence. So make sure one sentence per line in your training/dev/test data, and use whitespace to separate tokens. – StanfordNLPHelp May 15 '16 at 22:09

0 Answers0