2

I am trying to serialize Doc object from Spacy. Looks like all the hierarchy is not getting serialized. Basically I want to serialize this object to send over a Rest call.

Simple test case given below:

import spacy
import jsonpickle

nlp = spacy.load('en_core_web_sm')
print(type(nlp))

text = "This is United States"
doc = nlp(text)
print('Output from noun_chunks before Serialization:')
for chunk in doc.noun_chunks:
    print(chunk)

frozen = jsonpickle.encode(doc)

doc = jsonpickle.decode(frozen)
print(type(doc))

print('Output from noun_chunks after SerDe:')
for chunk in doc.noun_chunks:
    print(chunk)

Error:

> Traceback (most recent call last):   File "tests/temp.py", line 19, in
> <module>
>     for chunk in doc.noun_chunks:   File "doc.pyx", line 569, in noun_chunks ValueError: [E029] noun_chunks requires the dependency
> parse, which requires a statistical model to be installed and loaded.
> For more info, see the documentation: https://spacy.io/usage/models
> 
> Process finished with exit code 1
Prateek Dorwal
  • 323
  • 3
  • 11

1 Answers1

0

The documentation provides a good example of this issue. Basically, use the pickle library and be aware that the whole spacy doc object will be pickled - not only the text. Your code would then need to look like this:

import spacy    
nlp = spacy.load('en_core_web_sm')    
text = "This is United States"
doc = nlp(text)
doc_data = pickle.dumps(doc)

Find an example with code details here: https://spacy.io/usage/saving-loading#pickle

Another option is to use doc.to_json() or doc.to_dict() and do your general serialization from there.

CodeComa
  • 11
  • 1