-1

I am just playing around with Part-of-speech Tagging, and started using OpenNLP.

I am using the following code to load the model (Java):

        m_modelFile = new FileInputStream("c:\\DATA\\en-parser-chunking.bin");
        m_model = new ParserModel(m_modelFile);
        m_parser = ParserFactory.create(m_model);  
        ...
        Parse topParses[] = ParserTool.parseLine(sentence, m_parser, 1);

I am noticing that the call to create the ParserModel object is insanely slow. Could be b/c en-parser-chunking.bin is 35MB in size. Is there a better way to use this so that it's not this slow? Alternatively, is there a POS tagger you recommend or a way of calling the API that's faster?

I've been playing around with the accuracy, and it's pretty good. But, I am not happy with the performance when loading the model...

Thanks guys.

Phoeniyx
  • 542
  • 4
  • 15

1 Answers1

0

If you are looking for a fast Java (or Python) POS tagger, you might consider to use RDRPOSTagger. RDRPOSTagger is a robust, easy-to-use and language-independent toolkit for POS and morphological tagging. It obtains fast performance in both learning and tagging process. For example in Java, tagging speed is 90K English words/second using a computer with Core2Duo 2.4 GHz. And it achieves a very competitive accuracy in comparison to the state-of-the-art results. See experimental results including performance speed and tagging accuracy on 13 languages in this paper.

NQD
  • 470
  • 5
  • 8