1

Running spark-shell --packages "graphframes:graphframes:0.7.0-spark2.4-s_2.11" in the bash shell works and I can successfully import graphframes 0.7, but when I try to use it in a scala jupyter notebook like this:

import scala.sys.process._
"spark-shell --packages \"graphframes:graphframes:0.7.0-spark2.4-s_2.11\""!
import org.graphframes._

gives error message:

<console>:53: error: object graphframes is not a member of package org
   import org.graphframes._

Which from what I can tell means that it runs the bash command, but then still cannot find the retrieved package.

I am doing this on an EMR Notebook running a spark scala kernel.

Do I have to set some sort of spark library path in the jupyter environment?

Joe S
  • 410
  • 6
  • 16

1 Answers1

0

That simply shouldn't work. What your code does is a simple attempt to start a new independent Spark shell. Furthermore Spark packages have to loaded when the SparkContext is initialized for the first time.

You should either add (assuming these are correct versions)

spark.jars.packages graphframes:graphframes:0.7.0-spark2.4-s_2.11

to your Spark configuration files, or use equivalent in your SparkConf / SparkSessionBuilder.config before SparkSession is initialized.

  • ahh, so what I appear to be doing is making a second spark-shell that includes my library, but then not using it since I am already using an instance of spark-shell. Is this correct? – Joe S Feb 12 '19 at 18:36