I have a long running spark context via spark job server. The batch jobs are triggered periodically, In some scenario the batch job got exception with following stack trace. There is not much clue with the stack trace to see where the exception started.
On restarting the job-server and running with same input the job runs fine.
From the logs, the code has run successfully before calling the below method.
import org.joda.time.DateTime
import com.datastax.spark.connector._
import com.datastax.spark.connector.cql.CassandraConnector
import org.apache.spark.rdd.RDD
case class Key(start: DateTime)
//method
val startDate = DateTime.parse("2016-07-04T00:00:00.000+00:00")
val endDate = DateTime.parse("2016-07-05T00:00:00.000+00:00")
val keyspace = "test_keyspace"
val table = "test_table"
val dates = List(startDate)
val keys = dates.map(date => Key(date))
val rdd = sc.parallelize(keys)
.joinWithCassandraTable(keyspace, table)
.where("ts > ?", startDate)
.where("ts <= ?", endDate)
.map(x => x._2)
val ids = trackerRdd.flatMap(x => x.getSet[String]("ids")).distinct.sortBy(x => x).collect().toList
logger.info(s"$ids")
Here is the stack trace. This has the writeSerialData->ordinaryObject->defaultwritefields in repetitive manner.
WARN s.j.JobManagerActor [] [] - Exception from job c269030a-615a-4218-97eb-328008e3c667:
java.util.concurrent.ExecutionException: Boxed Error
at scala.concurrent.impl.Promise$.resolver(Promise.scala:55) ~[scala-library-2.10.5.jar:na]
at scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:47) ~[scala-library-2.10.5.jar:na]
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:244) ~[scala-library-2.10.5.jar:na]
at scala.concurrent.Promise$class.complete(Promise.scala:55) ~[scala-library-2.10.5.jar:na]
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) ~[scala-library-2.10.5.jar:na]
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:23) ~[scala-library-2.10.5.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_72]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_72]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
Caused by: java.lang.StackOverflowError: null
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) ~[na:1.8.0_72]
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) ~[na:1.8.0_72]
Question
Will collect cause recursive call ? Re-running the job with same input works fine . Any ideas to debug this ?
The issue is not reproducible easily . It occurs after running for some days.