1

I'm facing a strange error when importing the Pandas library into my Zeppelin notebook. Here is the basic code that I have as part of my cell:

%python

import pandas as pd

df = pd.read_csv (r'target/youtube_videos.csv')
print (df)

I get the following Error:

Fail to execute line 3: import pandas as pd
Traceback (most recent call last):
  File "/tmp/1636039066525-0/zeppelin_python.py", line 153, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 3, in <module>
ModuleNotFoundError: No module named 'pandas'

I tried to see what my Python path looks like and here it is:

%sh
python --version
python3-config --configdir

This gives me the following:

Python 3.7.0b3
/usr/lib/python3.8/config-3.8-x86_64-linux-gnu

I'm using Zeppelin 0.10.0.

EDIT:

I tried the following:

joesan@joesan-InfinityBook-S-14-v5:~/Projects/Private/ml-projects/ml-data-preparation-sandbox$ zstart
Please specify HADOOP_CONF_DIR if USE_HADOOP is true
Zeppelin start                                             [  OK  ]
joesan@joesan-InfinityBook-S-14-v5:~/Projects/Private/ml-projects/ml-data-preparation-sandbox$ python
Python 3.7.0b3 (default, Mar 30 2018, 04:35:22) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas'
>>> 

Pandas seems to be already installed:

joesan@joesan-InfinityBook-S-14-v5:~/Projects/Private/ml-projects/ml-data-preparation-sandbox$ pip3 install pandas
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (1.3.4)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/lib/python3/dist-packages (from pandas) (2.7.3)
Requirement already satisfied: numpy>=1.17.3 in /usr/lib/python3/dist-packages (from pandas) (1.17.4)
Requirement already satisfied: pytz>=2017.3 in /usr/lib/python3/dist-packages (from pandas) (2019.3)
WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
joesan@joesan-InfinityBook-S-14-v5:~/Projects/Private/ml-projects/ml-data-preparation-sandbox$ 

I have even set the python interpreter in Zeppelin as below:

enter image description here

joesan
  • 13,963
  • 27
  • 95
  • 232

3 Answers3

2

Are you sure you even have pandas installed? Unless Zeppelin uses its own Python, that would be the problem. Give pip3 install pandas a shot.

  • I have installed pandas already. Please see my edited post again! – joesan Nov 04 '21 at 17:14
  • Try `pip3 --version`. That'll tell us which Python pip is installing packages for. –  Nov 04 '21 at 17:19
  • joesan@joesan-InfinityBook-S-14-v5:~/Projects/Private/ml-projects/ml-data-preparation-sandbox$ pip3 --version pip 21.2.4 from /home/joesan/.local/lib/python3.8/site-packages/pip (python 3.8) – joesan Nov 04 '21 at 17:53
1

Looks like Python interpreter used by Zeppelin doesn't configured properly. You may have several different Pythons installed and You think about one but Zeppelin uses other. You have to check parameter zeppelin.python. Then is needed to check if in this Python pandas library is installed (I think no).

This parameter specifies "Path of the already installed Python binary. If python is not in your $PATH you can set the absolute directory (example : /usr/bin/python)"

By default, Zeppelin will use Python defined in zeppelin.python property to run Python process. The interpreter can use all modules already installed (with pip, easy_install...)

Than need to install pandas for interpreter used by Zeppelin.

Or specify in this parameter path to Python interpreter where pandas is already installed.

Ihor Konovalenko
  • 1,298
  • 2
  • 16
  • 21
  • Strange, I have the zeppelin.python set to /usr/bin/python, but still it does not respect it and coughs out the same error. – joesan Nov 04 '21 at 18:18
  • I have edited my post with a screenshot of the zeppelin.python configuration. I don't think I'm the first to hit such a basic problem. Damn! – joesan Nov 04 '21 at 18:21
  • May be it is installed two Pythons, and You install `pandas` into one, but Zeppelin uses other. Run manually interpreter specified in Zeppelen and check if `pandas` exists. – Ihor Konovalenko Nov 04 '21 at 18:23
  • I do have different versions of Python installed, but how do I go about now fixing this problem? – joesan Nov 04 '21 at 18:24
  • 1
    Like this `/usr/bin/python -m pip install pandas`. Than is needed to restart python interpreter in Zeppelin – Ihor Konovalenko Nov 04 '21 at 18:29
0

For anyone who might be facing the same issues, here is how I solved it:

  1. Install pyenv
  2. Install python version 3.7.8 using pyenv
  3. Set the version 3.7.8 using the pyenv global command
  4. Set the zeppelin.interpretor to python
joesan
  • 13,963
  • 27
  • 95
  • 232