1

I am trying to incorporate spacy's dependency parser into a legacy code in java through web API.

All other components tokenizer, tagger, merged_words, NER are done from the legacy NLP code. I am only interested to apply the dependency parser along with the dependency rule matcher of spacy 3.

I have tried the following approach

  1. creating a new doc object using https://spacy.io/api/doc#init.
from spacy.tokens import Doc
sent=["The heating_temperature was found to be 500 C"]
words=["The","heating_temperature", "was", "found", "to", "be", "500", "C"]
spaces=[True,True,True,True,True,True,True,False]
tags=["DT","NN","VBD","VBN","TO","VB","CD","NN"]
ents=["O","I-PARAMETER","O","O","O","O","I-VALUE","O"]
doc = Doc(nlp.vocab, words=words,spaces=spaces, tags=tags, ents=ents)
  1. Create an NLP pipeline with only parser
#can use nlp.blank too
nlp2 = spacy.load("en_core_web_sm", exclude=['attribute_ruler', 'lemmatizer', 'ner', "parser","tagger"])
pipeWithParser = nlp2.add_pipe("parser", source=spacy.load("en_core_web_sm"))
processed_dep = pipeWithParser(doc) #refer similar example in https://spacy.io/api/tagger#call

However, I am getting the following dependency tree

dependency tree

where every word is an nmod relation to the first word.

What am I missing? I could use the tagger of spacy too if req. I tried including tagger using above similar method but all tags were labeled 'NN'

hunsvadis
  • 13
  • 5
  • Instead of using pipeWithParser(doc), if i use nlp2(sentence). I get the correct output, but I need to pass in doc object, which is not allowed by nlp2(). The method called is similar to as given in similar example in https://spacy.io/api/tagger#call , so I assumed it to work... – hunsvadis Sep 17 '21 at 05:46

1 Answers1

1

The parser component in en_core_web_sm depends on the tok2vec component, so you need to run tok2vec on the doc before running parser for the parser to have the right input.

doc = nlp2.get_pipe("tok2vec")(doc)
doc = nlp2.get_pipe("parser")(doc)
aab
  • 10,858
  • 22
  • 38
  • Thank you for the answer. But tok2vec is included in the pipeline. – hunsvadis Sep 17 '21 at 09:23
  • But not if you run `pipeWithParser(doc)` on the plain doc. You need `doc = nlp2.get_pipe("tok2vec")(doc)` first. – aab Sep 17 '21 at 09:26
  • Yes, that worked. Thanks. Since your latest comment adds clarity, if you could please add this comment to your existing answer too. I will accept it. – hunsvadis Sep 17 '21 at 11:10