1

I'm trying out the Spark job server - specifically, the docker container option. I was able to run the WordCountExample app in spark local mode. However, I ran into an exception when I tried to point the app to a remote Spark master.

Following are the commands I used to run the WordCountExample app:

 1. sudo docker run -d -p 8090:8090 -e SPARK_MASTER=spark://10.501.502.503:7077 velvia/spark-jobserver:0.6.0
 2. sbt job-server-tests/package
 3. curl --data-binary @job-server-tests/target/scala-2.10/job-server-tests_2.10-0.6.2-SNAPSHOT.jar localhost:8090/jars/test
 4. curl -d "input.string = a b c a b see" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'

Following is the exception I hit when I ran step 4 above:

{
  "status": "ERROR",
  "result": {
    "message": "Futures timed out after [15 seconds]",
    "errorClass": "java.util.concurrent.TimeoutException",
    "stack": ["scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)", "scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)", "scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)", "akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:169)", "scala.concurrent.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3640)", "akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:167)", "akka.dispatch.BatchingExecutor$Batch.blockOn(BatchingExecutor.scala:101)", "scala.concurrent.Await$.result(package.scala:107)", ...

I started the remote Spark cluster (master and workers) using

cd $SPARK_HOME
./sbin/start-all.sh

The remote cluster uses Spark version 1.5.1 (ie, the prebuilt binary spark-1.5.1-bin-hadoop2.6)

Questions

  1. Any suggestions on how I could debug this?
  2. Are there any logs I could look into to figure out the root cause?

Thanks in advance.

jithinpt
  • 1,204
  • 2
  • 16
  • 33
  • Can you check how many Spark JVMs are running? I think you may only have a Spark master with no workers and any job will ultimately time out leading to issues like the one you experience. Just a hunch. – Jacek Laskowski Dec 01 '15 at 15:00
  • There are 4 worker JVMs running. I'm able to launch other Spark jobs on the Spark cluster. However, spark job server is timing out. – jithinpt Dec 01 '15 at 18:22

2 Answers2

0

This could be a network issue. SJS server should be reachable from Spark cluster.

noorul
  • 1,283
  • 1
  • 8
  • 18
0

I had same problem with spark 1.6.1. I changed jobserver version to last (0.6.2.mesos-0.28.1.spark-1.6.1) and it works for me.

Cortwave
  • 4,747
  • 2
  • 25
  • 42