ClassNotFoundException: org.apache.spark.repl.SparkCommandLine

Question

I am a newbie in Apache Zeppelin and I try to run it locally. I try to run just a simple sanity check to see that sc exists and get the error below.

I compiled it for pyspark and spark 1.5 (I use spark 1.5). I increased the memory to 5 GB and changed the port to 8091.

I am not sure what I did wrong so I get the following error and how should I solve it.

Thanks in advance

java.lang.ClassNotFoundException: org.apache.spark.repl.SparkCommandLine at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:401) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:485) at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:174) at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:152) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:302) at org.apache.zeppelin.scheduler.Job.run(Job.java:171) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Update The solution for me was to degrade my scala version from 2.11.* to 2.10.*, build Apache Spark again and run Zeppelin.

You say "locally" but you don't say if you if your Spark config is local[] or laying Zeppelin over an existing cluster? Also you don't say what steps you have taken to troubleshoot this. What is the setting in your Interpreter menu for master? Post some of your zeppelin-env.sh or zeppelin-site.xml files? If sc is doing this I'd assume it's the underlying cluster config somehow, I assume pyspark works OK there? — JimLohse, Jan 16 '16 at 03:57
Having installed Zeppelin last night, thanks for asking this, very cool product, just for a "double-sanity check" what happens when you run the underlying spark interactive shell for Scala that automatically creates a SparkContext? If you get this solved please post as an answer :) — JimLohse, Jan 16 '16 at 22:48
Spark runs locally too. Scala as well does not run properly. Regarding the zeppelin-site.xml the only property I changed is the port to 8091. In zeppelin-env.sh I added two lines `export ZEPPELIN_MEM=-Xmx5g` and `export SPARK_HOME=/opt/spark-1.5.2/`. pyspark runs ok. — Tom Ron, Jan 18 '16 at 08:45
recommend you put scala and python in the tags to attract a little wider crowd with more experience in Spark and Zeppelin, I have some ideas and will post an answer in an hour or two, busy right now. Just to be sure, you have the Master on the Zeppelin Interpreter menu set to local[]? or local[n] or local[\*], not naming a specific server? And same ownership on /opt/spark-1.5.2/* as Zeppelin install? — JimLohse, Jan 18 '16 at 15:52

score 3 · Answer 1 · answered Jan 18 '16 at 18:00

I am making certain assumptions based on what you have answered in comments. It sounds like the Zeppelin setup is good, when I looked at the class SparkCommandLine it's part of Spark's core.

Now Zeppelin has its own minimal embedded Spark classes, which are activated if you don't set SPARK_HOME. So first, per this github page, try not setting SPARK_HOME (which you are setting) and HADOOP_HOME (which I don't think you are setting), to see if eliminating your underlying Spark install "fixes" it:

Without SPARK_HOME and HADOOP_HOME, Zeppelin uses embedded Spark and Hadoop binaries that you have specified with mvn build option. If you want to use system provided Spark and Hadoop, export SPARK_HOME and HADOOP_HOME in zeppelin-env.sh You can use any supported version of spark without rebuilding Zeppelin.

If that works, then you know we are looking at a Java classpath issue. To try to fix this, there's one more setting that goes in the zeppelin-env.sh file,

ZEPPELIN_JAVA_OPTS

mentioned here on the Zeppelin mailing list, make sure you set that to point to the actual Spark jars so the JVM picks it up with a -classpath

Here's what my zeppelin process looks like for comparison, I think the important part is the -cp argument, do the ps on your system and look through your JVM options to see if it's similarly pointing to

/usr/lib/jvm/java-8-oracle/bin/java -cp /usr/local/zeppelin/interpreter/spark/zeppelin-spark-0.5.5-incubating.jar:/usr/local/spark/conf/:/usr/local/spark/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar
-Xms1g -Xmx1g -Dfile.encoding=UTF-8 -Xmx1024m -XX:MaxPermSize=512m -Dfile.encoding=UTF-8 -Xmx1024m -XX:MaxPermSize=512m -Dzeppelin.log.file=/usr/local/zeppelin/logs/zeppelin-interpreter-spark-jim-jim.log org.apache.spark.deploy.SparkSubmit --conf spark.driver.extraClassPath=:/usr/local/zeppelin/interpreter/spark/zeppelin-spark-0.5.5-incubating.jar
--conf spark.driver.extraJavaOptions=  -Dfile.encoding=UTF-8 -Xmx1024m -XX:MaxPermSize=512m  -Dfile.encoding=UTF-8 -Xmx1024m -XX:MaxPermSize=512m -Dzeppelin.log.file=/usr/local/zeppelin/logs/zeppelin-interpreter-spark-jim-jim.log
--class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer /usr/local/zeppelin/interpreter/spark/zeppelin-spark-0.5.5-incubating.jar 50309

Hope that helps if that doesn't work please edit your question to show your existing classpath.

If we could get the attention of a Spark contributor I bet this could solve this more directly @zero323 ? — JimLohse, Jan 18 '16 at 18:03
Thank you for your answer. Unfortunately it still does not work. My Zeppelin process is very similar (and too long to put in the comment) beside some versions difference. In the 3rd paragraph did you mean "If that works" or "If that does not work"? So I downloaded Zeppelin 0.55 and the process died. Hope someone from Zeppelin maintainers can join the discussion. — Tom Ron, Jan 19 '16 at 08:10
Ok, after closely checking it I think the issue was the scala version. Moving from scala version 2.11 to 2.10 and recompiling it solve the problem. Thank you very much for your efforts. — Tom Ron, Jan 19 '16 at 11:55
Oh or should I say D'Oh! forgot to ask what Scala version, yeah Spark needs to be compiled differently for 2.11 per http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211 glad you got it sorted out, you may want to post and accept your own answer here? So if you want to try to stick with 2.11 try that doc I linked and see if it works. — JimLohse, Jan 19 '16 at 14:27

score 0 · Answer 2 · answered Aug 30 '16 at 18:21

Zeppelin recently released version 0.6.1 which supports Scala 2.11 and Spark 2.0. I too was puzzled by this error message, since I could clearly see my Spark home directory in the classpath. The new version of Zeppelin works great; I'm currently running it with Spark 2.0/Scala 2.11.

ClassNotFoundException: org.apache.spark.repl.SparkCommandLine

2 Answers2