0

I tried to run the folloiwng simple code in Zeppelin:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.{Logging, SparkConf, SparkContext}
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream

System.clearProperty("spark.driver.port")
System.clearProperty("spark.hostPort")

def maxWaitTimeMillis: Int = 20000
def actuallyWait: Boolean = false

val conf = new SparkConf().setMaster("local[2]").setAppName("Streaming test")
var sc = new SparkContext(conf)

def batchDuration: Duration = Seconds(1)
val ssc = new StreamingContext(sc, batchDuration)

This is the output in Zeppelin:

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.{Logging, SparkConf, SparkContext}
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
calculateRMSE: (output: org.apache.spark.streaming.dstream.DStream[(Double, Double)], n: org.apache.spark.streaming.dstream.DStream[Long])Double
res50: String = null
res51: String = null
maxWaitTimeMillis: Int
actuallyWait: Boolean
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@1daf4e42
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:356)
org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:150)
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:525)
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:345)
org.apache.zeppelin.scheduler.Job.run(Job.java:176)
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
    at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2257)
    at org.apache.spark.SparkContext$$anonfun$assertNoOtherContextIsRunning$1.apply(SparkContext.scala:2239)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2239)
    at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2312)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:91)

Why does it say that I have multiple SparkContexts running? If I do not add the line var sc = new SparkContext(conf), then sc is null, so it's not created.

Klue
  • 1,317
  • 5
  • 22
  • 43
  • the SparkContext should be automatically created with the name sc by Zeppelin.. I know in your post you've said that it's null, but it shouldn't... – mgaido Apr 28 '16 at 14:07
  • @mark91: ok, you are right. I double checked the code and `sc` is really created. The problem now is with setting the checkpoint directory. – Klue Apr 28 '16 at 14:15
  • which is your problem with it? Have you tried to do ssc.checkpoint("/my_cwonderful_checkpoint_dir")? – mgaido Apr 28 '16 at 14:17
  • @mark91: does it refer to a local file system or hdfs? – Klue Apr 28 '16 at 14:20
  • it depends on which is the default filesystem for your Spark installation – mgaido Apr 28 '16 at 14:57

1 Answers1

1

You can't use multiple SparkContexts in Zeppelin. It's one of his limitations since he's creating actually a webhook to a SparkContext.

If you wish to set up the your SparkConf in Zeppelin, the easiest way is to set those properties in the Interpreter menu and restart the interpreter to take those configuration in your SparkContext.

Now you can go back to your notebook and test your code :

import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.{Logging, SparkConf, SparkContext}
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream

def maxWaitTimeMillis: Int = 20000
def actuallyWait: Boolean = false

def batchDuration: Duration = Seconds(1)
val ssc = new StreamingContext(sc, batchDuration)

More on that here.

eliasah
  • 39,588
  • 11
  • 124
  • 154
  • Thanks. It says that I should also set the checkpoint directory. It can be done as follows `ssc.checkpoint(SparkCheckpointDir)`, however how to define `SparkCheckpointDir`? – Klue Apr 28 '16 at 14:14
  • You can set the checkpoint directory within the Interpreter menu also. You'll see a spark properties with name and value. – eliasah Apr 28 '16 at 14:15
  • Do you know some tutorial showing how to do this? Also, I appreciate if you explain how to set checkpoint programmatically. Many thanks. – Klue Apr 28 '16 at 14:17
  • You can do that actually in Zeppelin for now. But you can set the checkpointdir programmatically in ssc – eliasah Apr 28 '16 at 14:18
  • Ok, I mean `var SparkCheckpointDir: File = _` Should it point to some directory in HDFS? – Klue Apr 28 '16 at 14:19
  • You can't do that in Zeppelin. – eliasah Apr 28 '16 at 14:20
  • Hmm. It still says `Spark Streaming cannot be initialized with both SparkContext and checkpoint as null` if I copy-paste your example. I also tried `val ssc = StreamingContext.getOrCreate(sc.getCheckpointDir.toString)`, but it lacks some other arguments. Unfortunatelly very little information can be found in google... – Klue Apr 28 '16 at 14:53
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/110544/discussion-between-eliasah-and-klue). – eliasah Apr 28 '16 at 14:59