Is there a way to train the existing Apache OpenNLP POS Tagger model? I need to add a few more proper nouns to the model that are specific to my application. When I try to use the below command:
opennlp POSTaggerTrainer -type maxent -model en-pos-maxent.bin \
-lang en -data en-pos.train -encoding UTF-8
the entire model is retrained. I'd only like to append a few new sentences to en-pos-maxent.bin
This is how my training file looks:
Where_WRB is_VBZ the_DT Seven_DNNP Dwarfs_DNNP Mine_DNNP Train_DNNP ?_?
Where_WRB is_VBZ the_DT Astro_DNNP Orbiter_DNNP ?_?
Where_WRB is_VBZ the_DT Barnstormer_DNNP ?_?
Where_WRB is_VBZ the_DT Big_DNNP Thunder_DNNP Mountain_DNNP Railroad_DNNP ?_?
Where_WRB is_VBZ the_DT Buzz_DNNP Lightyears_DNNP Space_DNNP Ranger_DNNP Spin_DNNP ?_?
Where_WRB is_VBZ the_DT Casey_DNNP Jr_DNNP Splash_DNNP N_DNNP Soak_DNNP Station_DNNP ?_?
Where_WRB is_VBZ the_DT Cinderella_DNNP Castle_DNNP ?_?
Where_WRB is_VBZ the_DT Country_DNNP Bear_DNNP Jamboree_DNNP ?_?
Where_WRB is_VBZ the_DT Dumbo_DNNP the_DNNP Flying_DNNP Elephant_DNNP ?_?
Where_WRB is_VBZ the_DT Enchanted_DNNP Tales_DNNP with_DNNP Belle_DNNP ?_?
Where_WRB is_VBZ the_DT Frontierland_DNNP Shootin_DNNP Arcade_DNNP ?_?
After training the model, all words except those in the training file are tagged as DNNP
.
For example, if I ask for the word 'Where' (present in the training file) to be tagged, the answer is WRB
, but if I ask the word 'hello' (not present in the training file) to be tagged, it is tagged as DNNP
. So I want to add a few words. How can I do that?