1

I want to choose a natural language processing tool to do common tasks such as tokenization, sentence detection,various tagging (Name Entity Recognition, POS tagging, ... ). my question has two parts:

  1. What are the criteria for choosing a natural language processing tool?
  2. Among (UIMA, LingPipe, Lucene, Gate, Stanford), which one satisfy these criteria better?

and what is your suggestion ?

Frakcool
  • 10,915
  • 9
  • 50
  • 89
aliakbarian
  • 709
  • 1
  • 11
  • 20
  • Could you be more specific regarding your tasks? The comparison really depends on what concrete tasks you want to achieve. – Renaud Sep 18 '13 at 12:36
  • The Languageware Resource Workbench will do what you mention, and output to a UIMA dictionary. However my response is potentially biased and incomplete, so I'm not putting it in as an answer. – Simon O'Doherty Sep 18 '13 at 13:06
  • 4
    UIMA is not an NLP tool. It is an interoperability and scaling framework which allows to integrate such tools into a common framework. There are several flavors of UIMA component collections which do what you want (e.g. DKPro Core, ClearTK, U-Compare, etc.) some of which integrate tools you mention (e.g. LingPipe, Stanford, etc.) GATE is somewhere in between. If you are on Java, I'd probably suggest some first steps with Apache OpenNLP (ASL) or Stanford CoreNLP (GPL), depending which license you prefer. - Mind this is an opinion question and not really suited for Stackoverflow. – rec Sep 18 '13 at 14:22
  • A nice overview can be found here: http://emerge.mc.vanderbilt.edu/natural-language-processing-nlp-survey-tools-resources – peschü Feb 24 '15 at 15:46

1 Answers1

2

Some general Criteria:

  1. how many tasks can I perform with the provided models (e.g. does the tool contains models for my tasks like spanish tokenisation or protein NER)?
  2. how easy is it for me to add the missing tools.

BTW, I would add NLTK to your list, and its excellent, free accompanying book.

Renaud
  • 16,073
  • 6
  • 81
  • 79