-1

I created a UIMA stack using OpenNLP that runs locally across all cores. It does a variety of tasks including reading from a CSV file, inserting text to a database, parsing the text, POS tagging text, chunking text, etc. I also got it to run a variety of tasks across a spark cluster.

We want to add some machine learning algorithms to the stack and DeepLearning4j came up as a very viable option. Unfortunately, it was not clear how to integrate DL4J within what we currently have or if it simply replicates the stack I have now.

What I have not found in the UIMA, ClearTK, and Deeplearning4j sites is how these three libraries fit together. Does DeepLearning4J implement a ClearTK set of abstract classes that calls OpenNLP functions? What benefit does ClearTK provide? Do I worry about how DeepLearning4J implements anything with the ClearTK framework?

Thanks!

Ben Holland
  • 161
  • 1
  • 7

2 Answers2

1

As far as I understand you're running a UIMA pipeline which uses some OpenNLP based AnalysisEngines, so far that's fine. What is not clear from your question is what you're looking for in terms of feature, rather than tooling. So I think that's the first thing to clarify.

Other than that, Apache UIMA is an architectural framework; there you can integrate OpenNLP, DL4J, ClearTK or anything else is useful for your unstructured information processing task.

In the Apache OpenNLP project we're doing some experiments for integrations of different DL frameworks, you can have a https://issues.apache.org/jira/browse/OPENNLP-1009 (current prototypes are based on DL4J).

Since you mentioned you're leveraging an Apache Spark cluster, DL4J might be a good fit as it should integrate smoothly with it.

  • Right! Thanks! I think the problem that we are trying to figure out is how. for example, I have the stack working with the OpenNLP chain. I don't see how ClearTK fits within the framework or extends it. Is it something that I have to call or configure explicitly? In addition, once OpenNLP has done the low-level processes, do I then call the DL4J methods (e.g., word2vec)? – Ben Holland Dec 14 '17 at 16:28
0

We only use it as part of a set of interfaces for NLP with dl4j. A tokenizer factory and tokenizer that uses UIMA internally for tokenization and sentence segmentation with our sentenceiterator interface. That's very different from building your own models with deeplearning4j itself.

Adam Gibson
  • 3,055
  • 1
  • 10
  • 12
  • So would it be fair to say that DL4J is a nearly entirely separate project from the UIMA framework, though might incorporate it when it comes to abstracting common NLP tasks? – Ben Holland Dec 14 '17 at 16:30
  • Yeah definitely! We largely use it as an external tokenizer/sentence segmenter. – Adam Gibson Dec 15 '17 at 07:43