I am using Spark Job Server to submit spark jobs in cluster .The application I am trying to test is a spark program based on Sansa query and Sansa stack . Sansa is used for scalable processing of huge amounts of RDF data and Sansa query is one of the sansa libraries which is used for querying RDF data. When I am running the spark application as a spark program with spark-submit command it works correctly as expected.But when ran the program through spark job server , the applications fails most of the time with below exception .
0/05/29 18:57:00 INFO BlockManagerInfo: Added rdd_44_0 in memory on us1salxhpw0653.corpnet2.com:37017 (size: 16.0 B, free: 366.2 MB) 20/05/29 18:57:00 ERROR ApplicationMaster: RECEIVED SIGNAL TERM 20/05/29 18:57:00 INFO SparkContext: Invoking stop() from shutdown hook 20/05/29 18:57:00 INFO JobManagerActor: Got Spark Application end event, stopping job manger. 20/05/29 18:57:00 INFO JobManagerActor: Got Spark Application end event externally, stopping job manager 20/05/29 18:57:00 INFO SparkUI: Stopped Spark web UI at http://10.138.32.96:46627 20/05/29 18:57:00 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID 63, us1salxhpw0653.corpnet2.com, executor 1, partition 3, NODE_LOCAL, 4942 bytes) 20/05/29 18:57:00 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 60) in 513 ms on us1salxhpw0653.corpnet2.com (executor 1) (1/560) 20/05/29 18:57:00 INFO TaskSetManager: Starting task 4.0 in stage 3.0 (TID 64, us1salxhpw0669.corpnet2.com, executor 2, partition 4, NODE_LOCAL, 4942 bytes) 20/05/29 18:57:00 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 61) in 512 ms on us1salxhpw0669.corpnet2.com (executor 2) (2/560) 20/05/29 18:57:00 INFO TaskSetManager: Starting task 5.0 in stage 3.0 (TID 65, us1salxhpw0670.corpnet2.com, executor 3, partition 5, NODE_LOCAL, 4942 bytes) 20/05/29 18:57:00 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID 62) in 536 ms on us1salxhpw0670.corpnet2.com (executor 3) (3/560) 20/05/29 18:57:00 INFO BlockManagerInfo: Added rdd_44_4 in memory on us1salxhpw0669.corpnet2.com:34922 (size: 16.0 B, free: 366.2 MB) 20/05/29 18:57:00 INFO BlockManagerInfo: Added rdd_44_3 in memory on us1salxhpw0653.corpnet2.com:37017 (size: 16.0 B, free: 366.2 MB) 20/05/29 18:57:00 INFO DAGScheduler: Job 2 failed: save at SansaQueryExample.scala:32, took 0.732943 s 20/05/29 18:57:00 INFO DAGScheduler: ShuffleMapStage 3 (save at SansaQueryExample.scala:32) failed in 0.556 s due to Stage cancelled because SparkContext was shut down 20/05/29 18:57:00 ERROR FileFormatWriter: Aborting job null. > > org.apache.spark.SparkException: Job 2 cancelled because SparkContext was shut down at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:820) at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:818) at scala.collection.mutable.HashSet.foreach(HashSet.scala:78) at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:818) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1732) at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1651) at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1923) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1317) at org.apache.spark.SparkContext.stop(SparkContext.scala:1922) at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:584) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192)
Code which used for direct execution
object SansaQueryExampleWithoutSJS {
def main(args: Array[String]) {
val spark=SparkSession.builder().appName("sansa stack example").getOrCreate()
val input = "hdfs://user/dileep/rdf.nt";
val sparqlQuery: String = "SELECT * WHERE {?s ?p ?o} LIMIT 10"
val lang = Lang.NTRIPLES
val graphRdd = spark.rdf(lang)(input)
println(graphRdd.collect().foreach(println))
val result = graphRdd.sparql(sparqlQuery)
result.write.format("csv").mode("overwrite").save("hdfs://user/dileep/test-out")
}
Code Integrated with Spark Job Server
object SansaQueryExample extends SparkSessionJob {
override type JobData = Seq[String]
override type JobOutput = collection.Map[String, Long]
override def validate(sparkSession: SparkSession, runtime: JobEnvironment, config: Config):
JobData Or Every[ValidationProblem] = {
Try(config.getString("input.string").split(" ").toSeq)
.map(words => Good(words))
.getOrElse(Bad(One(SingleProblem("No input.string param"))))
}
override def runJob(sparkSession: SparkSession, runtime: JobEnvironment, data: JobData): JobOutput = {
val input = "hdfs://user/dileep/rdf.nt";
val sparqlQuery: String = "SELECT * WHERE {?s ?p ?o} LIMIT 10"
val lang = Lang.NTRIPLES
val graphRdd = sparkSession.rdf(lang)(input)
println(graphRdd.collect().foreach(println))
val result = graphRdd.sparql(sparqlQuery)
result.write.format("csv").mode("overwrite").save("hdfs://user/dileep/test-out")
sparkSession.sparkContext.parallelize(data).countByValue
}
}
Steps for executing an application via spark job server is explained here ,mainly
- upload the jar into SJS through rest api
- create a spark context with memory and core as required ,through another api
- execute the job via another api mentioning the jar and context already created
So when I observed different executions of the program , I could see like the spark job server is behaving inconsistently and the program works on few occasions without any errors .Also observed like sparkcontext is being shutdown due to some unknown reasons .I am using SJS 0.8.0 and sansa 0.7.1 and spark 2.4