I am trying to use the graphframes library on Apache Zeppelin with the Spark (pyspark) interpreter, however, I keep on getting the error:
ModuleNotFoundError: No module named 'graphframes'
whenever I try to import the graphframes module using from graphframes import *
.
I have tried adding the --packages 'graphframes:graphframes:0.7.0-spark2.4-s_2.11'
directive in the zeppelin-env.sh file, I tried using the z.load('graphframes:graphframes:0.7.0-spark2.4-s_2.11')
function, and I tried adding graphframes as a dependency in the interpreter setting, however, none of these attempts have worked.
I have also tried adding a spark repository to Zeppelin and then adding the maven coordinates for graphframes to the interpreter on zeppelin under the dependencies section. However, this did not work either.
I am using spark version 2.4 with scala 2.11 on zeppelin 0.8.1 hosted on an EMR cluster.
I am able to use graphframes from the terminal using pyspark and the --packages directive mentioned above, so this seems to be a zeppelin related issue.
I am stumped as to what I might do further. Any ideas on how I can get graphframes to work on zeppelin?