hi i know that stanfordNERenglish.muc.7class.distsim.crf.ser.gz help to classify 7 classes: Location, Person, Organization, Money, Percent, Date, Time but i want to classify text in 7 class but say person full name,money, date, time, location, degree, etc... please let me how to customize model nlp library Stanford nlp/ gate/ open nlp
-
You will need to have training-data specific for what you want to tag. Also, check the FAQ: http://nlp.stanford.edu/software/crf-faq.shtml#a – David Batista Jun 05 '16 at 11:13
2 Answers
well, If you use opennlp, as given in this documentation , create your Training data:
<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .
those tags are what you've to add for all different entities you want to find. and use the training API or the CLI given in the documentation and make your models.
also,if your training set has around 15000 lines, then you can expect good results!
In OpenNLP,you can create custom NER model using below steps.
First, you need to train your data in the given format <START:entity-name> .....<END>
. Let's say you want to create medicine NER model. So it will be something like this:
Example:
<START:medicine> Augmentin-Duo <END> is a penicillin antibiotic that contains two medicines -
<START:medicine> amoxicillin trihydrate <END> and <START:medicine> potassium clavulanate <END>. They work together to kill certain types of bacteria and are used to treat certain types of bacterial infections
Training data should have at least 15000 sentences to get the better results.
Use TokenNameFinderModel class,called with desired model name, data file path.
You can create one like this using command line:
$opennlp TokenNameFinderTrainer -model en-ner-drugs.bin -lang en -data drugsDetails.txt -encoding UTF-8
To do the same using java, you can refer this post: Writing a custom NameFinder model in OpenNLP.

- 20,106
- 8
- 49
- 101