UPDATE THE SPARK ENVIRONMENT TO USE PYTHON 3.7:
Open a new terminal and type the following command: export PYSPARK_PYTHON=python3.7
This will ensure that the worker nodes use Python 3.7 (same as the Driver) and not the default Python 3.4
DEPENDING ON VERSIONS OF PYTHON YOU HAVE, YOU MAY HAVE TO DO SOME INSTALL/UPDATE ANACONDA:
(To install see: https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart)
Make sure you have anaconda 4.1.0 or higher. Open a new terminal and check your conda version by typing into a new terminal:
conda --version
checking conda version
if you are below anaconda 4.1.0, type conda update conda
- Next we check to see if we have the library nb_conda_kernels by typing
conda list
Checking if we have nb_conda_kernels
- If you don’t see
nb_conda_kernels
type
conda install nb_conda_kernels
Installing nb_conda_kernels
- If you are using Python 2 and want a separate Python 3 environment please type the following
conda create -n py36 python=3.6 ipykernel
py35 is the name of the environment. You could literally name it anything you want.
Alternatively, If you are using Python 3 and want a separate Python 2 environment, you could type the following.
conda create -n py27 python=2.7 ipykernel
py27 is the name of the environment. It uses python 2.7.
- Ensure the versions of python are installed successfully and close the terminal. Open a new terminal and type
pyspark
. You should see the new environments appearing.