2

I'm trying to load a custom model called 'ru2' into spacy (for npl processing).

it can be found there: https://github.com/buriy/spacy-ru

The problem is when I call the function

nlp = spacy.load('ru2')
doc = nlp(text)

I see the error

C:\ProgramData\Anaconda3\lib\importlib\_bootstrap.py:205: RuntimeWarning: spacy.tokens.span.Span size changed, may indicate binary incompatibility. Expected 72 from C header, got 80 from PyObject
  return f(*args, **kwds)
Traceback (most recent call last):
  File "C://.../nlp/src/ie/main.py", line 125, in <module>
    main(examp_dict['Poroshenko'])
  File "C://.../nlp/src/ie/main.py", line 92, in main
    nlp = spacy.load('ru2')
  File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\__init__.py", line 27, in load
    return util.load_model(name, **overrides)
  File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 133, in load_model
    return load_model_from_path(Path(name), **overrides)
  File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 173, in load_model_from_path
    return nlp.from_disk(model_path)
  File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 791, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 630, in from_disk
    reader(path / key)
  File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\language.py", line 781, in <lambda>
    deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(p, exclude=["vocab"])
  File "tokenizer.pyx", line 391, in spacy.tokenizer.Tokenizer.from_disk
  File "tokenizer.pyx", line 432, in spacy.tokenizer.Tokenizer.from_bytes
  File "C:\ProgramData\Anaconda3\lib\site-packages\spacy\util.py", line 606, in from_bytes
    msg = srsly.msgpack_loads(bytes_data)
  File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\_msgpack_api.py", line 29, in msgpack_loads
    msg = msgpack.loads(data, raw=False, use_list=use_list)
  File "C:\ProgramData\Anaconda3\lib\site-packages\srsly\msgpack\__init__.py", line 60, in unpackb
    return _unpackb(packed, **kwargs)
  File "_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
TypeError: unhashable type: 'list'

I was searching for similar questions in the Internet:

But non of those solutions work for me.

I use

  • msgpack==0.5.6 (even downgraded as suggested in the link above)
  • spacy==2.1.4
eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Luxor
  • 351
  • 1
  • 3
  • 17

3 Answers3

4

Here is from https://spacy.io/usage#troubleshooting

If you’re training models, writing them to disk, and versioning them with git, you might encounter this error when trying to load them in a Windows environment. This happens because a default install of Git for Windows is configured to automatically convert Unix-style end-of-line characters (LF) to Windows-style ones (CRLF) during file checkout (and the reverse when committing). While that’s mostly fine for text files, a trained model written to disk has some binary files that should not go through this conversion. When they do, you get the error above. You can fix it by either changing your core.autocrlf setting to "false", or by committing a .gitattributes file] to your repository to tell git on which files or folders it shouldn’t do LF-to-CRLF conversion, with an entry like path/to/spacy/model/** -text. After you’ve done either of these, clone your repository again.

Junpeng He
  • 176
  • 1
  • 5
  • This was exactly the issue I had. However, after updating the .gitattributes file to indicate the model was binary, I had to re-checkout the repo to get the files with the correct line endings. Just updating the .gitattributes file didn't update the files which were already checked out. – Matt H Oct 11 '22 at 22:03
1

It might be because the version number of SpaCy used to generate your model is not the same as the version of SpaCy you have installed. (I don't know of course, just mentioning it in case it helps.)

Eric McLachlan
  • 3,132
  • 2
  • 25
  • 37
-1

Adding to the answer above, another quick fix would be to manually download the zip from the repository.