How can I install a module to a specific Jupyter kernel without use of ! or terminal?

Question

I am using JupyterHub on an EMR and Pandas is not installed on the PySpark or PySpark3 kernels. These kernels also disallow use of !. I have tried to install using

import pip
pip.main(['install','pandas])

But this raises ValueError: I/O operation on closed file.

When I open the terminal kernel, pandas is already installed.

Please let me know if there are other ways to install to a specific kernel.

you should apply a bootstrap .py script (containing all your modules) when creating the emr cluster because all modules need to be installed on every node (if you intend to use them with spark) — thePurplePython, Aug 20 '19 at 02:50
I have added a bootstrap with pandas, but it does not show up in the PySpark or PySpark3 kernel; it only shows up in the Python kernel — Collin Cunningham, Aug 20 '19 at 13:04
how are you installing in the bootstrap? can you paste the command? — thePurplePython, Aug 20 '19 at 13:35
I have the shebang and then the following command: `sudo pip install scipy scikit-learn pandas pyarrow` — Collin Cunningham, Aug 20 '19 at 16:42

score 1 · Accepted Answer · answered Aug 20 '19 at 19:21

1

Faced similar problems and this resolved my situation

#bootstrap
sudo python3 -m pip install <packages>

# set in $SPARK_HOME/conf/spark-env.sh or use the config.json template for EMR
export PYSPARK_DRIVER_PYTHON=python3
export PYSPARK_PYTHON=python3

Reference: AWS EMR - ModuleNotFoundError: No module named 'pyarrow'

answered Aug 20 '19 at 19:21

thePurplePython

2,621
1
13
34

Wow, turns out you were right! It was a Python 2 vs 3 issue. – Collin Cunningham Aug 22 '19 at 16:26

How can I install a module to a specific Jupyter kernel without use of ! or terminal?

1 Answers1