Hi I am trying to do stress test on the spark job server, and I am sharing the spark context with the following properties among the submitted jobs.
spark.executor.cores='2'
spark.cores.max='1'
spark.driver.cores='1'
spark.driver.memory='1g'…
I'm running a job using spark jobserver (takes +-10min).
The job randomly crash during its execution (around 1 time on 2) with the following exception on the executor :
ERROR 2016-10-13 19:22:58,617 Logging.scala:95 -…
In Spark, version 1.6.1 (code is in Scala 2.10), I am trying to write a data frame to a Parquet file:
import sc.implicits._
val triples = file.map(p => _parse(p, " ", true)).toDF()…
I am a complete novice in Spark and just started exploring more on this. I have chosen the longer path by not installing hadoop using any CDH distribution and i have installed Hadoop from Apache website and setting the config file myself to…
In my company we are currently using Spark interpreter to generate dynamically class files with spark-jobserver. Those class files are generated on our Spark cluster driver and saved into the directory (on that driver) defined by using…
Can anyone suggest me a better documentation about spark-jobserver. I have gone through the url spark-jobserver but unable to follow the same. It will be great if some one explain step by step instruction on how to use spark-jobserver.
Tools used…
I have built my job jar using sbt assembly to have all dependencies in one jar. When I try to submit my binary to spark-jobserver I am getting akka.pattern.AskTimeoutException
I modified my configuration to be able to submit large jars (I added…
I am trying to run spark-jobserver with spark-2.0
I cloned branch spark-2.0-preview from github repository. I follow the deployment guide but when I try to deploy server using bin/server_deploy.sh. I got compilation error:
Error:
[error]…
I'm getting messages along the lines of the following in my Spark JobServer logs:
Stage 14 contains a task of very large size (9523 KB). The maximum recommended task size is 100 KB.
I'm creating my RDD with this code:
List data = new…
I'm using Apache Spark 2.0.2 together with Apache JobServer 0.7.0.
I know this is not a best practice but this is a first step. My server have 52 Gb RAM and 6 CPU Cores, Cent OS 7 x64, Java(TM) SE Runtime Environment (build 1.7.0_79-b15) and it have…
I've set up a spark-jobserver to enable complex queries on a reduced dataset.
The jobserver executes two operations:
Sync with the main remote database, it makes a dump of some of the server's tables, reduce and aggregates the data, save the result…
When I post simultaneous jobserver requests, they always seem to be processed in FIFO mode. This is despite my best efforts to enable the FAIR scheduler. How can I ensure that my requests are always processed in parallel?
Background: On my cluster…
I'm using spark cluster in standalone mode + spark job-server for my written in Scala jobs execution. I launched job-server in docker container:
docker run -d -p 8090:8090 -e SPARK_MASTER=spark://spark-server:7077…
In DataStax Enterprise Edition 4.8 , Spark Jobserver 0.5.2 has been specially compiled against the supported version of Apache Spark 1.4.1.1. Spark job will read data from Cassandra and write summarized data into another table in same Keyspace.
Is…
First of all, our standalone Spark cluster consists of 20 nodes, each one of them has 40 cores and 128G memory (including the 2 masters).
1.
We use Spark-Job-Server for reusing Spark-Context (in the core, we want to reuse cached RDD for querying),…