How to NLP-process a doc line by line in DeepLearning4j?

Question

I have a question regarding SentenceIterator/DocumentIterator for NLP. Each line in my file represents a short document, which consists of 1 or more sentences. I would like to pass each line to UIMA nlp processor and receive a List of pos-tagged sentences for this single line (therefore one document), let's say List of PosTaggedSentences. Is there something similar in the DL4j library that can achieve this purpose?

SentenceIterator iter = UimaSentenceIterator.createWithPath(filePath);

This code splits all sentences in the file into individual ones, but it doesn't preserve the structure of one document per line.

Any suggestions how to do this in DL4j?

score 0 · Answer 1 · answered May 19 '17 at 14:40

Why not instantiate the UimaSentenceIterator in your code? The DeepLearning4j docs suggest doing so with the following example:

For anything complex, we recommend an actual machine-learning level pipeline, represented by the UimaSentenceIterator.

SentenceIterator iter = new UimaSentenceIterator(path,AnalysisEngineFactory.createEngine(
    AnalysisEngineFactory.createEngineDescription(
        TokenizerAnnotator.getDescription(), SentenceAnnotator.getDescription())));

How to NLP-process a doc line by line in DeepLearning4j?

1 Answers1