I am using an EMR notebook attached to my cluster for some experimentation purposes. I needed to install some python modules for testing, specifically spacy and it's data module en_core_web_sm.
I ssh'ed into the master and core nodes and downloaded the modules individually. However I am not able to import from the my EMR notebook. I get the following error :
An error was encountered:
No module named 'spacy'
Traceback (most recent call last):
ModuleNotFoundError: No module named 'spacy'
I know there is a way to install them just for the scope of EMR notebook, but this wouldn't suffice in a production scenario, so please avoid answers which suggest notebook installing as mentioned in this guide : https://aws.amazon.com/blogs/big-data/install-python-libraries-on-a-running-cluster-with-emr-notebooks/
Please let me know if I am missing some setup steps. Appreciate your response.