4

I am using PyInstaller package a python script into an .exe. This script is using spacy to load up the following model: en_core_web_sm. I have already run python -m spacy download en_core_web_sm to download the model locally. The issue is when PyInstaller tries to package up my script it can't find the model. I get the following error: Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory. I thought maybe this meant that I needed to run the download command in my python script in order to make sure it has the model, but if I have my script download the model it just says the requirements are already satisfied. I also have a hook file that handles bringing in hidden imports and is supposed to bring in the model as well:

from PyInstaller.utils.hooks import collect_all, collect_data_files

datas = []
datas.extend(collect_data_files('en_core_web_sm'))

# ----------------------------- SPACY -----------------------------
data = collect_all('spacy')

datas.extend(data[0])
binaries = data[1]
hiddenimports = data[2]

# ----------------------------- THINC -----------------------------
data = collect_all('thinc')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- CYMEM -----------------------------
data = collect_all('cymem')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- PRESHED -----------------------------
data = collect_all('preshed')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- BLIS -----------------------------

data = collect_all('blis')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- STDNUM -----------------------------

data = collect_all('stdnum')

datas.extend(data[0])
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- OTHER -------------------------------

hiddenimports += ['srsly.msgpack.util']

I use the following code to download the model and then to package the script with PyInstaller:

os.system('python -m spacy download en_core_web_sm')
PyInstaller.__main__.run([path_to_script, '--onefile', '--additional-hooks-dir=.'])

The hook-spacy.py script is in the same directory as the script that is being packaged by PyInstaller.

All of this works if I run the script locally. It finds the model as it should. I only get this error if I try to package the script with PyInstaller and try to run the .exe.

I am using Python v3.8.7, PyInstaller v4.2, and spacy v3.0.3 with en_core_web_sm v3.0.0

sabo
  • 911
  • 13
  • 37
  • See also this question (and good answer): https://stackoverflow.com/questions/67354667/packaging-spacy-model-with-pyinstaller-e050-cant-find-model?noredirect=1&lq=1 – Pux Mar 13 '22 at 23:28

1 Answers1

3

When you use PyInstaller to collect data files into the bundle as you are doing here, the files are actually compiled into the resulting exe itself. This is transparently handled for Python code by PyInstaller when import statements are evaluated.

However, for data files you must handle this yourself. For instance, spacy is likely looking for the model in the current working directory. It won’t find your model because it is compiled into the .exe instead and therefore isn’t present in the current working directory.

You will need to use this API:

https://pyinstaller.readthedocs.io/en/stable/spec-files.html#using-data-files-from-a-module

This allows you to read a data file from the exe that PyInstaller creates. You can then write it to the current working directory and then spacy should be able to find it.

David K. Hess
  • 16,632
  • 2
  • 49
  • 73
  • So the docs make it seem that this is the way to get the model: `pkgutil.get_data('en_ore_web_sm', 'en_core_web_sm-3.0.0\\ner\\model')`. Does that look about right to you? If so, I'm not particularly clear on what I call in the spacy.load() function at that point since the pkgutil.get_data() seems to return a binary object to me while the load function wants a path. – sabo Mar 08 '21 at 15:07
  • I'm not sure if that's the correct call arguments but if it is returning a binary object they likely are good. The simplest thing to do is take the returned binary data and write it to a file called en_ore_web_sm in the current working directory and then call spacy normally. – David K. Hess Mar 08 '21 at 15:39
  • Couldn't I also fix this issue by not specifying the `--onefile` argument for PyInstaller? That should create the files necessary in the same directory as the .exe at that point. – sabo Mar 08 '21 at 16:09
  • I think so but I believe you’ll need to treat the file as a “binary” instead of “data” to avoid packaging. – David K. Hess Mar 08 '21 at 17:32
  • I removed the `--onefile` argument and things are progressing. I'm having an issue with getting the site-packages location in my exe now. I'm importing `site` and calling `getsitepackages()` but I get the following error: `module 'site' has no attribute 'getsitepackages'`. – sabo Mar 08 '21 at 22:33
  • I don't think PyInstaller is going to provide you with an environment with a real site packages setup. What problem are you trying to solve with getsitepackages()? – David K. Hess Mar 09 '21 at 01:09
  • A custom package is grabbed from the site packages location. Now I'm curious if I can just as sit to the hooks file. – sabo Mar 09 '21 at 04:03
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/229687/discussion-between-david-k-hess-and-sabo). – David K. Hess Mar 09 '21 at 13:40