I am new to opennlp , need help to customize the parser
I have the used the opennlp parser with the pre-trained model en-pos-maxtent.bin to tag new raw english sentences with the corresponding parts fo speech, now i would like to customize the tags.
example sentence: Dog jumped over the wall.
after POS tagging by using en-pos-maxtent.bin , the result would be
Dog - NNP
jumped - VBD
over - IN
the - DT
wall - NN
but i want to train my own model and tag the words with my custom tags like
DOG - PERP
jumped - ACT
over - OTH
the - OTH
wall - OBJ
where PERP, ACT,OTH,OBJ are the tags that suit my necessities. is this possible ?
I checked the section of their documentation, they have given code to train a model and use it later on , the code goes like this
try {
dataIn = new FileInputStream("en-pos.train");
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<POSSample> sampleStream = new WordTagSampleStream(lineStream);
model = POSTaggerME.train("en", sampleStream, TrainingParameters.defaultParams(), null, null);
}
catch (IOException e) {
// Failed to read or parse training data, training failed
e.printStackTrace();
}
I am not able to understand what this "en-pos.train" is ?
what is the format of this file ? can we specify the custom tags here or what exactly this file is ?
any help would be appreciated
Thanks