4

I wanted to do my training for Named Entity Recognition functionality in OpenNLP. I wrote a piece of code according to http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind

I started with a trivial example trying to train for “number” and marked all \d+ in a training file like this:

In <START:number> 1941 <END>, Paramount Pictures produced a movie version of the play.

The code is:

static String markedFile    = "C:/MyStuff/eclipse_workspace/OpenNlpTest/src/NameFinderTraining/en-ner-number-marked.train";
    static String modelFile     = "C:/MyStuff/eclipse_workspace/OpenNlpTest/src/NameFinderTraining/en-ner-number-marked.bin";

    @SuppressWarnings("deprecation")
    public static void main(String[] args) throws Exception 
    {
        Charset charset = Charset.forName("UTF-8");
        ObjectStream<String> lineStream =
                new PlainTextByLineStream(new FileInputStream( markedFile), charset);
        ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);

        TokenNameFinderModel model;

        try 
        {
            model = NameFinderME.train("en", "person", sampleStream,
                    Collections.<String, Object>emptyMap(), 100, 5);
        }
        finally 
        {
            sampleStream.close();
        }

        BufferedOutputStream modelOut = null;
        try 
        {
            modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
            model.serialize(modelOut);
        } 
        finally 
        {
            if (modelOut != null) 
                    modelOut.close();      
        }   
    }

I got the following exception:

Computing event counts...  java.io.IOException: Found unexpected annotation while handling a name sequence: until the ###<START:number>### 1950 <END>s

My guess is “number” is not in a default annotation list. What should I do? If I need a “custom annotation” could someone give me an example.

  • 1
    I actually figured out an error looking at the RegEx in the training code and after fixing it went further. But now I got "Model not compatible with name finder!" in the middle of "Computing model parameters ..." – user1623058 Nov 21 '12 at 13:24
  • the error was in spaces. I looked at the RegEx in the NameFinder code 1941 NameFinder code – user1623058 Nov 24 '12 at 14:50
  • the error was in spaces. 1941 I looked at the RegEx in NameSample class code and figured it out. – user1623058 Nov 24 '12 at 14:57

1 Answers1

9

OpenNLP throws this kind of exception when tag is not recognized properly.

Try removing any special characters after/before tag.

<END>. is invalid.
<END> . is valid. 
Jai Bhatt
  • 91
  • 1
  • 2