1

I am trying to use the graphframes library on Apache Zeppelin with the Spark (pyspark) interpreter, however, I keep on getting the error: ModuleNotFoundError: No module named 'graphframes' whenever I try to import the graphframes module using from graphframes import *.

I have tried adding the --packages 'graphframes:graphframes:0.7.0-spark2.4-s_2.11' directive in the zeppelin-env.sh file, I tried using the z.load('graphframes:graphframes:0.7.0-spark2.4-s_2.11') function, and I tried adding graphframes as a dependency in the interpreter setting, however, none of these attempts have worked.

I have also tried adding a spark repository to Zeppelin and then adding the maven coordinates for graphframes to the interpreter on zeppelin under the dependencies section. However, this did not work either.

I am using spark version 2.4 with scala 2.11 on zeppelin 0.8.1 hosted on an EMR cluster.

I am able to use graphframes from the terminal using pyspark and the --packages directive mentioned above, so this seems to be a zeppelin related issue.

I am stumped as to what I might do further. Any ideas on how I can get graphframes to work on zeppelin?

Marxley
  • 120
  • 1
  • 6
  • @user10958683 The issue behind this problem was different from the question indicated in the link. The answers to that question were attempted but did not work. The answer to this question could help others who are facing a similar problem. – Marxley Jun 04 '19 at 18:54

1 Answers1

2

I think the problem is the your PYTHONPATH in Zeppelin. You can see the PYTHONPATH with:

import sys
print(sys.path)

It works with the pyspark console because the package will be installed in a location which is already part of the PYTHONPATH. You can cheack that with:

import graphframes
print(graphframes.__file__)

So all you have to do is to ad the package to your PYTHONPATH. Add the following line to /etc/spark/conf/spark-defaults.conf (other ways like the --packages parameter as SPARK_SUBMIT_OPTIONS should work as well):

spark.jars.packages graphframes:graphframes:0.7.0-spark2.4-s_2.11

After that you should add to /etc/spark/conf/spark-env.sh the following line to extend your PYTHONPATH (check the package location): export PYTHONPATH=$PYTHONPATH:/var/lib/zeppelin/.ivy2/jars/graphframes_graphframes-0.7.0-spark2.4-s_2.11.jar

Restart your the spark interpreter in zeppelin to make sure that all changes are applied.

cronoik
  • 15,434
  • 3
  • 40
  • 78
  • This resolved the issued. Marked as the answer. Thank you very much. I would never have been able to come to this solution on my own. – Marxley Jun 04 '19 at 18:21