How to run Spark Jobserver on Mesosphere's DC/OS

Question

There is a lot I clearly don't understand about Spark, Spark Jobserver, and Mesosphere's DC/OS. But I very much like the Jobserver project, and also very much like our DC/OS cluster, and would really like to get them running together.

Throwing the Docker container into marathon file, like this example, does not work. I thought maybe this was all due to me not knowing what SPARK_MASTER url to pass in (which I still don't know, any help there would be greatly appreciated), but then I tried removing that from the marathon file, which should still run the project in local mode, and that also doesn't work. Which makes me realize, beyond not knowing how to connect this jobserver to my DCOS spark dispatcher, I also just don't know why this Docker container will fail on the cluster, but not on my local machine, even when it is not passed any arguments.

My logs do not show much, and the Docker container exits with a status of 137 after the following in stdout:

LOG_DIR empty; logging will go to /tmp/job-server

Which, when I run things locally, is the last log before it continues to run log4j into my stdout and tell me that the jobserver is starting up. I see the following in stderr:

app/server_start.sh: line 54:    15 Killed                  $SPARK_HOME/bin/spark-submit --class $MAIN --driver-memory $JOBSERVER_MEMORY --conf "spark.executor.extraJavaOptions=$LOGGING_OPTS" --driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES" $@ $appdir/spark-job-server.jar $conffile

Which just seems to suggest that the server_start.sh is running from the spark jobserver docker, and that script is for some reason dying?

I stripped my marathon file all the way down to this, which is still giving me the same errors:

{
  "id": "/jobserver",
  "cpus": 0.5,
  "mem": 100,
  "ports": [0],
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "velvia/spark-jobserver:0.6.2.mesos-0.28.1.spark-1.6.1"
    }
  }
}

Any help would be greatly appreciated.

score 3 · Accepted Answer · answered Jun 09 '16 at 03:14

The following worked for me sometime when I tried it.

{
  "id": "/spark.jobserver",
  "cmd": null,
  "cpus": 2,
  "mem": 2048,
  "disk": 50,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "volumes": [],
    "docker": {
      "image": "velvia/spark-jobserver:0.6.2.mesos-0.28.1.spark-1.6.1",
      "network": "BRIDGE",
      "portMappings": [
        {
          "containerPort": 8090,
          "hostPort": 0,
          "servicePort": 10001,
          "protocol": "tcp",
          "labels": {}
        }
      ],
      "privileged": false,
      "parameters": [],
      "forcePullImage": false
    }
  },
  "env": {
    "SPARK_MASTER": "mesos://zk://10.29.83.3:2181,10.29.83.4:2181/mesos"
  },
  "portDefinitions": [
    {
      "port": 10001,
      "protocol": "tcp",
      "labels": {}
    }
  ]
}

Ok, so what worked for me here was actually just running with the higher number CPU's/memory than was suggested in the docker.md in Jobserver documentation. But it turns out, the SPARK_MASTER does not actually work for me at all. And I still can't find one that works. I should be able to connect to the Spark Mesos Dispatcher that is run by the Spark application through marathon, no? It does not seem to work. — Nandan Rao, Jun 23 '16 at 12:12

How to run Spark Jobserver on Mesosphere's DC/OS

1 Answers1