0

I deployed a Spark application and encounter this error:

org.apache.spark.SparkException: Job aborted due to stage failure: Failed to serialize task 602, not attempting to retry it. Exception during serialization: java.io.NotSerializableException: scala.Unit$
Serialization stack:
    - object not serializable (class: scala.Unit$, value: object scala.Unit)
    - element of array (index: 0)
    - array (class [Lscala.Unit$;, size 1)
    - field (class: scala.collection.mutable.WrappedArray$ofRef, name: array, type: class [Ljava.lang.Object;)
    - object (class scala.collection.mutable.WrappedArray$ofRef, WrappedArray(object scala.Unit))
    - writeObject data (class: org.apache.spark.rdd.ParallelCollectionPartition)
    - object (class org.apache.spark.rdd.ParallelCollectionPartition, org.apache.spark.rdd.ParallelCollectionPartition@2699)
    - field (class: org.apache.spark.scheduler.ShuffleMapTask, name: partition, type: interface org.apache.spark.Partition)
    - object (class org.apache.spark.scheduler.ShuffleMapTask, ShuffleMapTask(68, 0))
    at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:1889)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1877)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1876)
    at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
    at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:926)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:926)
    at scala.Option.foreach(Option.scala:274)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)
    at org.apache.spark.rdd.RDD.$anonfun$foreach$1(RDD.scala:927)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.foreach(RDD.scala:925)

This error is extremely rare. And cannot be found anywhere on the internet. I'm under the impression that it is impossible for this error to happen, as Unit type & singleton will be removed by Scala compiler and become void in JVM bytecode.

Why this problem could happen and how do I eliminiate it in the future?

Sorry forgot to describe the environment:

The Spark application was compiled for Spark-2.4 & scala-2.12, and deployed in local-[*] mode

tribbloid
  • 4,026
  • 14
  • 64
  • 103
  • 3
    It will help if you share some code that is causing this issue. This is quite a common error in Spark-Scala development. – Amit Mar 27 '20 at 20:03
  • 1
    I'm trying to reproduce it with a shorter example but so far hasn't been able to do so. BTW, serialisation error indeed happens a lot in Spark dev. But type Unit being not serialisable is very rare, in fact, type Unit should be a placeholder and be removed when scala is being compiled into JVM bytecode!i – tribbloid Mar 27 '20 at 20:41
  • 1
    Re:"Unit being not serialisable is very rare", it depends on what you are doing in the Unit. It seems you are using a foreach and probably passing a function to it. That function may be using something which belongs to a class or object which would result in this exception. – Amit Mar 27 '20 at 22:11

0 Answers0