1

I know that this question was asked before - but the answer was not satisfying (in the sense of that the answer was just a link ).

So my question is, is there any way to extend the existing openNLP models? I already know about the technique with DBPedia/Wikipedia. But what if i just want to append some lines of text to improve the models - is there really no way? (If so - that would be really stupid...)

MWiesner
  • 8,868
  • 11
  • 36
  • 70
Fabian Lurz
  • 2,029
  • 6
  • 26
  • 52

2 Answers2

3

Unfortunately, you can't. See this question which has a detailed answer to the same problem.

I think, that is a though problem because when you deal with texts you have often licensing issues. For example, you can not build a corpus on Twitter data and publish it to the community (see this paper for some more information).

Therefore, often companies build domain specific corpora and use them internally. For example, we did in our research project. Therefore, we built a tool (Quick Pad Tagger) to create annotated corpora efficiently (see here).

Community
  • 1
  • 1
schrieveslaach
  • 1,689
  • 1
  • 15
  • 32
  • Wow ok. Thanks for your help. That really sucks!!! openNLP would benefit a lot if more people train the modles! – Fabian Lurz Sep 22 '15 at 09:29
  • I provided some additional information (see updated answer). I hope that it is helpful to you. Do you mind to mark the answer a correct? – schrieveslaach Sep 22 '15 at 09:39
  • Sure :) Forgot that. Thanks a lot for your help. I'm working right now but i will have a detailed look at the links later! Your F-Scores are impressive! Gj on that – Fabian Lurz Sep 22 '15 at 11:55
  • I just realized that this is really awesome :) Is there a download link? I would like to try this tool – Fabian Lurz Sep 22 '15 at 17:20
  • Ah - sry for posting so much -> definitely have a look at the yago databse. It is open source and i think that you can somehow use it to train the models – Fabian Lurz Sep 22 '15 at 17:41
  • Can you write me an E-Mail (first author in the paper) and than wie can discuss how you can get the tool. I would really like to open source the tool but I do not have the permission yet. – schrieveslaach Sep 22 '15 at 19:54
  • @FabianLurz, FYI, you might want to check out [NLPf](https://gitlab.com/schrieveslaach/NLPf/). – schrieveslaach Aug 13 '18 at 09:16
1

Ok i think this needs a separate answer. I found the Yago database: http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago//

This database seems to be just fantastic (from the first look). You can download all the tagged data and put it in a database (they already deliver the tools for that).

The next stage is to "refactor" the tagged entities so that opennlp can use it (openNLP uses sth. like this <START:person> Pierre Vinken <END>)

Then you create some text files and train it with the opennlp delivered training tool.

Not 100% sure if this works but i will come back and tell you.

Fabian Lurz
  • 2,029
  • 6
  • 26
  • 52