0

I am trying to install en_core_web_sm model from spacy on my databricks cluster. Is there a way to install it without using the below command in the notebook?

%sh    
python -m spacy download en_core_web_sm

Reasons I am asking for an alternative:

  • I want the model to be installed at a cluster level - above method only installs it at a notebook level
  • Using the above code requires me to use the %sh magic commands in my notebook and magic commands are not compatible with a notebook I feed in dbutils.notebook.run

It would be best if there is a way I can use pip install, but I don't think that's possible. Please note that I already have Spacy installed. This question is only in reference to spacy models

newbie101
  • 65
  • 7
  • `en_core_web_sm` is not module but only data - and `pip` is only for installing modules. Someone would have to create module directly data directly in module (but as I remeber file with data is big) or module during installation would have to automatically download it. – furas Sep 08 '22 at 12:44
  • this module can download data in code: `from spacy.cli import download` and `download('en')` but I don't know if all this can resolve your problem [pip - How to place Spacy en\_core\_web\_md model in Python package - Stack Overflow](https://stackoverflow.com/questions/62728854/how-to-place-spacy-en-core-web-md-model-in-python-package) – furas Sep 08 '22 at 12:47
  • @furas I don't think this method will help in installing the model on the whole cluster as it's notebook specific. I have found an answer to my question [here](https://stackoverflow.com/questions/72307171/error-while-importing-en-core-web-sm-for-spacy-in-azure-databricks) – newbie101 Sep 08 '22 at 13:51

0 Answers0