2

I followed the guidelines on how to train a a neural coreference model using NeuralCoref. I now have a model, but can not figure out how to use the coref model in Spacy.

The following shown in the manual does not describe how to load a custom model:

# Load your usual SpaCy model (one of SpaCy English models)
import spacy
nlp = spacy.load('custom-danish-spacy-model')

# Add neural coref to SpaCy's pipe
import neuralcoref
neuralcoref.add_to_pipe(nlp)

# You're done. You can now use NeuralCoref as you usually manipulate a SpaCy document annotations.
doc = nlp(u'A sentence in Danish. Another sentence in the same language.')

EDIT: I tried to put the trained model (produced by running python -m neuralcoref.train.learn --train ./data/train/ --eval ./data/dev/) in the NeuralCoref cache folder and run the code above. The following error was given:

  return f(*args, **kwds)
/home/johan/Code/spacy-neuralcoref/venv/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: spacy.vocab.Vocab size changed, may indicate binary incompatibility. Expected 96 from C header, got 104 from PyObject
  return f(*args, **kwds)
Traceback (most recent call last):
  File "custom_model_test.py", line 5, in <module>
    neuralcoref.add_to_pipe(nlp)
  File "/home/johan/Code/spacy-neuralcoref/neuralcoref/neuralcoref/__init__.py", line 42, in add_to_pipe
    coref = NeuralCoref(nlp.vocab, **kwargs)
  File "neuralcoref.pyx", line 554, in neuralcoref.neuralcoref.NeuralCoref.__init__
  File "neuralcoref.pyx", line 947, in neuralcoref.neuralcoref.NeuralCoref.from_disk
  File "/home/johan/Code/spacy-neuralcoref/venv/lib/python3.6/site-packages/thinc/neural/_classes/model.py", line 355, in from_bytes
    data = srsly.msgpack_loads(bytes_data)
  File "/home/johan/Code/spacy-neuralcoref/venv/lib/python3.6/site-packages/srsly/_msgpack_api.py", line 29, in msgpack_loads
    msg = msgpack.loads(data, raw=False, use_list=use_list)
  File "/home/johan/Code/spacy-neuralcoref/venv/lib/python3.6/site-packages/srsly/msgpack/__init__.py", line 60, in unpackb
    return _unpackb(packed, **kwargs)
  File "_unpacker.pyx", line 199, in srsly.msgpack._unpacker.unpackb
srsly.msgpack.exceptions.ExtraData: unpack(b) received extra data.
sophros
  • 14,672
  • 11
  • 46
  • 75
Johan
  • 21
  • 7

2 Answers2

0

Your problem is more of msgpack usage than SpaCy. I suggest you review How to import text from CoNNL format with named entities into spaCy, infer entities with my model and write them to the same dataset (with Python)? to get more clarity about the origin of the issue.

It may well be that the formats are incompatible between the model storing code and loading code in the circumstances you have but without the exact version of SpaCy and NeuralCoref code used for training / storing the model it is hard to tell exactly.

sophros
  • 14,672
  • 11
  • 46
  • 75
0

you can re-run by downgrading the packages of spacy to 2.0.13, first install cymem==1.31.2 and then install spacy==2.0.12, the order must be kept in mind while installing the neural coref.

!pip install https://github.com/huggingface/neuralcoref-models/releases/download/en_coref_md-3.0.0/en_coref_md-3.0.0.tar.gz !pip install cymem==1.31.2 spacy==2.0.12