6

I am using PySpark kernel installed through Apache Toree in Jupyter Notebook using Anaconda v4.0.0 (Python 2.7.11). After getting a table from Hive, use matplotlib/panda to plot some graph in Jupyter notebook, following the tutorial as below:

%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Set some Pandas options
pd.set_option('display.notebook_repr_html', False)
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 25)

normals = pd.Series(np.random.normal(size=10))
normals.plot()

I was stuck at the first link when I tried to use %matplotlib inline which shows

Name: Error parsing magics!
Message: Magics [matplotlib] do not exist!
StackTrace:

Looking at Toree Magic and MagicManager, I realised that %matplotlib is calling MagicManager instead of the iPython in-build magic command.

Is it possible for Apache Toree - PySpark to use iPython in-build magic command instead?

Angletear
  • 71
  • 4
  • Install `matplotlib`? – zero323 Sep 19 '16 at 11:35
  • 2
    @zero323 I can import `matplotlib` but when I try to run `%matplotlib inline` on the Jupyter notebook console shows `16/09/20 09:40:24 ERROR magic.MagicManager: No magic found for matplotlib` Is there a way to get iPython magic to work? – Angletear Sep 20 '16 at 02:06

1 Answers1

1

I did a workaround hack for PySpark and magic command to work, instead of installing Toree PySpark kernel I am using PySpark directly on Jupyter Notebook.

  1. Download and install Anaconda2 4.0.0

  2. Download Spark 1.6.0 pre-built for Hadoop 2.6

  3. Append ~/.bashrc with the following commands and enter source ~/.bashrc to update environment variables

    # added to run spark
    export PATH="{your_spark_dir}spark/sbin:$PATH"
    export PATH="{your_spark_dir}spark/bin:$PATH"

    # added to launch spark application in cluster mode
    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre

    # next 2 lines are optional, needed only Spark Cluster export HADOOP_CONF_DIR={your_hadoop_conf}/hadoop-conf
    export YARN_CONF_DIR={your_hadoop_conf}/hadoop-conf

    # added by Anaconda2 4.0.0 installer
    export PATH="{your_anaconda_dir}/Anaconda/bin:$PATH"

    # added to run pyspark in jupyter notebook
    export PYSPARK_DRIVER_PYTHON={your_anaconda_dir}/Anaconda/bin/jupyter
    export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='0.0.0.0' --NotebookApp.port=8888"
    export PYSPARK_PYTHON={your_anaconda_dir}/Anaconda/bin/python

Running the Jupyter Notebook

  1. pyspark --master=yarn --deploy-mode=client to start the notebook running PySpark in cluster mode

  2. Open a browser and enter IP_ADDRESS_OF_COMPUTER:8888

Disclaimer
This is only a workaround and not an actual way of fixing the problem please let me know if you found a way for Toree PySpark ipython inbuild magic command to work. Magic command such as %matplotlib notebook

Angletear
  • 71
  • 4