4

I am new in OpenNLP. I use OpenNLP to find location's name from sentence. My input string is "Italy pardons US colonel in CIA case". I can not find "Italy" word in result set. How can I solve this problem. Thanks in advance!

try {
   InputStream modelIn = new FileInputStream("en-token.bin");
   TokenizerModel tokenModel = new TokenizerModel(modelIn);
   modelIn.close();
   Tokenizer tokenizer = new TokenizerME(tokenModel);
   NameFinderME nameFinder =
      new NameFinderME(
         new TokenNameFinderModel(new FileInputStream("en-ner-location.bin")));
   String tokens[] = tokenizer.tokenize(documentStr);
   Span nameSpans[] = nameFinder.find(tokens);
   for( int i = 0; i<nameSpans.length; i++) {
      System.out.println("Span: "+nameSpans[i].toString());
   }
}
catch(Exception e) {
   System.out.println(e.toString());
}
Aubin
  • 14,617
  • 9
  • 61
  • 84
Dung TQ
  • 51
  • 1
  • 4
  • Try to move `modelIn.close();` after the `for()` loop – Aubin Apr 23 '13 at 03:56
  • Thank you for your reply,I moved modeIn.close(); after the for() loop but it still return empty result. If I replace Italy by France then it work fine. I don't know why it can not detect some position name like Italy, Italia, England. – Dung TQ Apr 23 '13 at 08:24
  • I changed sentence to "California near Arizona" and it was able to tell Arizona is a place but no output for California. I am afraid training data is incomplete. – akshayb May 06 '13 at 07:45
  • 1
    Hi Akshayb, I tested with many others input string and I see that If the location name is in the first word of sentence, Italy example, then the program can not recognize it. I used your input sentence, it's return both California and Arizona, I don't known why it return difference results. I downloaded tools in this link: http://opennlp.sourceforge.net/models-1.5/ – Dung TQ May 07 '13 at 03:49
  • I use this tool (download link: opennlp.sourceforge.net/models-1.5/en-pos-maxent.bin) to solve my problems. I parse your input sentence. Result is "California_NNP near_IN Arizona_NNP". Not only can I detect California and Arizona but I also find others type (organizations, person). I can do that because it is NNP or NNPs type – Dung TQ May 07 '13 at 04:09

1 Answers1

1

opennlp results are dependent on the data the model was created from. The en-ner-location.bin file at sourceforge may not contain samples that make sense for your data. Also, extracting nouns or noun phrases (NNP) with a chunker or POS tagger will not be isolated to only locations. So the answer to your question is: The model doesn't account for every case in your data, this is the reason why you don't get a hit on this particular sentence. BTW, NER is never perfect and will always return some degree of false positives and false negatives.

Mark Giaconia
  • 3,844
  • 5
  • 20
  • 42