1

I am trying to get named entity recognition to work within Tika. I have followed the guidelines that are provided here by David Meikle as well as the guide within the tika-docker examples git repo. I can get tika server deployed and processing files doing the text extraction, metadata, OCR etc. However when I perform the http://localhost:9998/meta as per the instructions it is meant to attempt NER, but this does not happen.

I have tried a number of different file formats including plain text files and PDF which are listed in the custom config but with no luck. I was expecting the 'parsed by' response to at least state is was attempting to do NER even if it didn't come back with any matches.

When I call http://localhost:9998/parsers there is no mention of the NER/NLP parsers on the list. Does anyone have any idea what I am wrong or how to get this to work?

  • It looks like https://github.com/apache/tika-docker/blob/master/sample-configs/ner/run_tika_server.sh is written for tika 1.x? In 2.x, we factored out the ner parsers into a separate jar that also has to be on your path: https://dlcdn.apache.org/tika/2.8.0/tika-parser-nlp-package-2.8.0.jar – Tim Allison May 19 '23 at 14:35
  • I just opened this issue on our JIRA: https://issues.apache.org/jira/browse/TIKA-4049 I'm not sure which option is best. Thank you for raising this, and please also check out our mailing lists: https://tika.apache.org/mail-lists.html – Tim Allison May 19 '23 at 14:45

0 Answers0