5

I currently use spaCy to traverse the dependency tree, and generate entities.

nlp = get_spacy_model(detect_lang(unicode_text))
doc = nlp(unicode_text)

entities = set()
for sentence in doc.sents:

  # traverse tree picking up entities
  for token in sentence.subtree:
    ## pick entitites using some pre-defined rules

entities.discard('')
return entities

Are there any good Java alternatives for spaCy?

I am looking for libs which generate the Dependency Tree as is done by spaCy.

EDIT:

I looked into Stanford Parser. However, it generated the following parse tree:

                     ROOT
                      |
                      NP
       _______________|_________
      |                         NP
      |                _________|___
      |               |             PP
      |               |     ________|___
      NP              NP   |            NP
  ____|__________     |    |     _______|____
 DT   JJ    JJ   NN  NNS   IN   DT      JJ   NN
 |    |     |    |    |    |    |       |    |
the quick brown fox jumps over the     lazy dog

However, I am looking for a tree structure like spaCy does:

                             jumps_VBZ
   __________________________|___________________
  |       |        |         |      |         over_IN
  |       |        |         |      |            |
  |       |        |         |      |          dog_NN
  |       |        |         |      |     _______|_______
The_DT quick_JJ brown_JJ   fox_NN  ._. the_DT         lazy_JJ
jgp
  • 2,069
  • 1
  • 21
  • 40
vin
  • 960
  • 2
  • 14
  • 28

4 Answers4

2

You're looking for the Stanford Dependency Parser. Like most of the Stanford tools, this is also bundled with Stanford CoreNLP under the depparse annotator. Other parsers include the Malt parser (a feature-based shift reduce parser) and Ryan McDonald's the MST parser (an accurate but slower maximum spanning tree parser).

Gabor Angeli
  • 5,729
  • 1
  • 18
  • 29
  • yes, and using that I got the above tree ^^. However, I need some help interpreting that and could not find any documentation. Spacy makes it very easy by tokenizing words so traversing the tree in Spacy is actually traversing the sentence. whereas the dependencies from Stanford parser is not the word-based dependency. Can you point me to some documentation, so I can convert the Stanford Parser tree into something that Spacy generates ? – vin Dec 22 '16 at 07:05
  • Stanford has both a [constituency parser](http://nlp.stanford.edu/software/lex-parser.shtml) and a [dependency parser](http://nlp.stanford.edu/software/nndep.shtml). And, confusingly, the constituency parser can also convert to dependency parses. Spacy's parser outputs dependency parses, and you're currently trying to use CoreNLP's constituency parser. See: http://stanfordnlp.github.io/CoreNLP/depparse.html (try getting a `SemanticGraph` object from the `BasicDependenciesAnnotation` annotation on a sentence) – Gabor Angeli Dec 22 '16 at 07:11
  • 1
    You can also try using the [Simple API](http://stanfordnlp.github.io/CoreNLP/simple.html), which will internally use the dependency parser. – Gabor Angeli Dec 22 '16 at 07:12
1

Another solution to integrate with Java and other languages is by using Spacy REST API. For example https://github.com/jgontrum/spacy-api-docker provide a Dockerization of Spacy REST API.

fellahst
  • 641
  • 1
  • 7
  • 16
1

I recently released spaCy4j which mimics Token container objects from spaCy and integrates with spaCy server or CoreNLP.

Once you have a running docker of spacy-server (very easy to set up) it's as easy as:

// Create a new spacy-server adapter with host and port matching a running instance of spacy-server.
SpaCyAdapter adapter = SpaCyServerAdapter.create("localhost", 8080);

// Create a new SpaCy object. It is thread safe and should be reused across our app
SpaCy spacy = SpaCy.create(adapter);

// Parse a doc
Doc doc = spacy.nlp("My head feels like a frisbee, twice its normal size.");

// Inspect tokens
for (Token token : doc.tokens()) {
    System.out.printf("Token: %s, Tag: %s, Pos: %s, Dependency: %s%n", 
            token.text(), token.tag(), token.pos(), token.dependency());
}

Feel free to contact via github for any questions etc.

guyman
  • 158
  • 1
  • 10
-2

spacy can be run through java program.

The env should be created first from command prompt by executing the following commands

python3 -m venv env
source ./env/bin/activate 
pip install -U spacy
python -m spacy download en
python -m spacy download de

create a bash file spacyt.sh with following commands,parallel to env folder

#!/bin/bash 
python3 -m venv env
source ./env/bin/activate 
python test1.py

place the spacy code as python script, test1.py

import spacy
print('This is a test script of spacy')
nlp=spacy.load("en_core_web_sm")
doc=nlp(u"This is a sentence")
print([(w.text, w.pos_) for w in doc])

// instead of print we can write to a file for further processing

In java program run the bash file

String cmd="./spacyt.sh";

        try {
            Process p = Runtime.getRuntime().exec(cmd);
            p.waitFor();
            System.out.println("cmdT executed!");
        } catch (Exception e) {
            e.printStackTrace();
        }
  • 3
    This isn't practical when productizing the code for latency critical paths eg: service backends for suggestion engines etc. For this particular project I did end up writing in the backend in Python :) – vin Mar 27 '19 at 10:41