22

In my project I have spaCy as a dependency in my setup.py, but I want to add also a default model.

My attempt so far has been:

install_requires=['spacy', 'en_core_web_sm'],
dependency_links=['https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm'],

inside my setup.py, but both a regular pip install of my package and a pip install --process-dependency-links return:

pip._internal.exceptions.DistributionNotFound: No matching distribution found for en_core_web_sm (from mypackage==0.1)

I found this github issue from AllenAI with the same problem and no solution.

Note that if I pip install the url of the model directly, it works fine, but I want to install it as a dependency when my package is install with pip install.

w4nderlust
  • 1,057
  • 2
  • 12
  • 22

3 Answers3

24

You can use pip's recent support for PEP 508 URL requirements:

install_requires=[
    'spacy',
    'en_core_web_sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz',
],

Note that this requires you to build your project with up-to-date versions of setuptools and wheel (at least v0.32.0 for wheel; not sure about setuptools), and your users will only be able to install your project if they're using at least version 18.1 of pip.

More importantly, though, this is not a viable solution if you intend to distribute your package on PyPI; quoting pip's release notes:

As a security measure, pip will raise an exception when installing packages from PyPI if those packages depend on packages not also hosted on PyPI. In the future, PyPI will block uploading packages with such external URL dependencies directly.

jwodder
  • 54,758
  • 12
  • 108
  • 124
  • Thank you for your answer. I tested it and it actually works fine. I will not adopt it anyway, because I plan to to release the package on PyPI, but will mark as correct anyway. What do you suggest to do for a PyPI package? I was thinking about catching spaCy's `FileNotFoundError` and printing an error messages that suggests to run `python -m spacy download en`, would that be a good compromise? – w4nderlust Nov 21 '18 at 22:08
21

Here is my workaround for a PyPi-installable package (edited slightly for clarity):

try:
    nlp = spacy.load('en')
except OSError:
    print('Downloading language model for the spaCy POS tagger\n'
        "(don't worry, this will only happen once)", file=stderr)
    from spacy.cli import download
    download('en')
    nlp = spacy.load('en')

It's cumbersome, but at least it works without having to involve the user. I'm trying to convince the spaCy team to package the most important model files for PyPi.

  • 1
    Indeed it seem this might be the only solution for publishing on PyPI for a while given that [this issue](https://github.com/explosion/spaCy/issues/3536) was closed. – Garrett Jul 26 '20 at 02:18
1

Not sure if this works for you, but in setup.py you might try:

os.system('python -m spacy download en')

after calling setuptools.setup(...)

edit:

According to spaCy docs, it looks like you can now add SpaCy models to your requirements.txt via url as well. You should then be able to import the model as a module where it is required:

import en_core_web_sm
nlp = en_core_web_sm.load()

Ref: https://spacy.io/usage/models

Wes Doyle
  • 2,199
  • 3
  • 17
  • 32
  • I tried it but it doesn't seem to work. I removed mypackage and spaCy from my local environment with pip uninstall, then installed mypackage again with `pip install`, it installed mypackage and spaCy, but then from the python interpreter `import spacy` works fine, while `spacy.load('en')` doesn't: `FileNotFoundError: [Errno 2] No such file or directory: '/home/piero/dev/venv3/local/lib/python3.6/site-packages/spacy/data/en/__init__.py'`. so I guess pip didn't run the additional line in the `setup.py`. – w4nderlust Nov 21 '18 at 22:02
  • Not sure if you've seen this discussion, but it may be of some help: https://github.com/explosion/spaCy/issues/2676 – Wes Doyle Nov 21 '18 at 22:14
  • 1
    Thank you for the additional comment and the edit Wes. I actually already install the model through pip rather than the usual `python -m spacy download en`, but it doesn't work in `setup.py` directly, you need to do it in the way @jwodder described that unfortunately doesn't work with PyPI. – w4nderlust Nov 23 '18 at 21:16