1

I have been successfully using Spark 2.4.3 - Scala - (in EMR 5.25.0) together with Delta Lake IO 0.6.0. My jobs run fine, but I am doing some optimisations and cleaning the house and noticed this strange exception, which although does not appear to involve my code, and it does not affect the successful completion of the Spark application, makes eyebrows raise :) I have been searching through the spark issues and so on but did not found any justification or further tips for it. It happens during this job:

20/05/13 23:34:28 INFO SparkContext: Starting job: apply at DatabricksLogging.scala:77
20/05/13 23:34:28 INFO DAGScheduler: Registering RDD 81 (apply at DatabricksLogging.scala:77)
20/05/13 23:34:28 INFO DAGScheduler: Registering RDD 96 (apply at DatabricksLogging.scala:77)
20/05/13 23:34:28 INFO DAGScheduler: Registering RDD 88 (apply at DatabricksLogging.scala:77)
20/05/13 23:34:28 INFO DAGScheduler: Registering RDD 101 (apply at DatabricksLogging.scala:77)
20/05/13 23:34:28 INFO DAGScheduler: Registering RDD 104 (apply at DatabricksLogging.scala:77)
20/05/13 23:34:28 INFO DAGScheduler: Got job 205 (apply at DatabricksLogging.scala:77) with 1 output partitions
20/05/13 23:34:28 INFO DAGScheduler: Final stage: ResultStage 1216 (apply at DatabricksLogging.scala:77)
20/05/13 23:34:28 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1215)
20/05/13 23:34:28 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 1215)
20/05/13 23:34:28 INFO DAGScheduler: Submitting ShuffleMapStage 1212 (MapPartitionsRDD[96] at apply at DatabricksLogging.scala:77), which has no missing parents
20/05/13 23:34:29 INFO MemoryStore: Block broadcast_220 stored as values in memory (estimated size 55.2 KB, free 4.6 GB)
20/05/13 23:34:29 INFO MemoryStore: Block broadcast_220_piece0 stored as bytes in memory (estimated size 20.7 KB, free 4.6 GB)
20/05/13 23:34:29 INFO BlockManagerInfo: Added broadcast_220_piece0 in memory on ip-10-10-175-231.eu-west-1.compute.internal:43215 (size: 20.7 KB, free: 4.6 GB)
20/05/13 23:34:29 INFO SparkContext: Created broadcast 220 from broadcast at DAGScheduler.scala:1201
20/05/13 23:34:29 INFO DAGScheduler: Submitting 521 missing tasks from ShuffleMapStage 1212 (MapPartitionsRDD[96] at apply at DatabricksLogging.scala:77) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))
20/05/13 23:34:29 INFO YarnClusterScheduler: Adding task set 1212.0 with 521 tasks

The exception:

20/05/13 23:36:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 10 to 10.10.175.48:33590
20/05/13 23:36:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 10 to 10.10.162.50:55798
20/05/13 23:36:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 10 to 10.10.174.108:42382
20/05/13 23:36:23 INFO TaskSetManager: Starting task 188.0 in stage 1214.0 (TID 22247, ip-10-10-175-231.eu-west-1.compute.internal, executor 3, partition 188, PROCESS_LOCAL, 8073 bytes)
20/05/13 23:36:23 INFO TaskSetManager: Finished task 95.0 in stage 1214.0 (TID 22154) in 4006 ms on ip-10-10-175-231.eu-west-1.compute.internal (executor 3) (1/200)
20/05/13 23:36:23 ERROR AsyncEventQueue: Listener EventLoggingListener threw an exception
java.lang.ClassCastException: java.util.Collections$SynchronizedSet cannot be cast to java.util.List
    at org.apache.spark.util.JsonProtocol$.accumValueToJson(JsonProtocol.scala:348)
    at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:324)
    at org.apache.spark.util.JsonProtocol$$anonfun$accumulableInfoToJson$3.apply(JsonProtocol.scala:324)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.util.JsonProtocol$.accumulableInfoToJson(JsonProtocol.scala:324)
    at org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:317)
    at org.apache.spark.util.JsonProtocol$$anonfun$accumulablesToJson$2.apply(JsonProtocol.scala:317)
    at scala.collection.immutable.List.map(List.scala:288)
    at org.apache.spark.util.JsonProtocol$.accumulablesToJson(JsonProtocol.scala:317)
    at org.apache.spark.util.JsonProtocol$.taskInfoToJson(JsonProtocol.scala:309)
    at org.apache.spark.util.JsonProtocol$.taskEndToJson(JsonProtocol.scala:149)
    at org.apache.spark.util.JsonProtocol$.sparkEventToJson(JsonProtocol.scala:76)
    at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:138)
    at org.apache.spark.scheduler.EventLoggingListener.onTaskEnd(EventLoggingListener.scala:158)
    at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
    at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
    at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
    at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
    at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
    at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
    at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
    at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
    at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
    at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
    at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
20/05/13 23:36:24 INFO TaskSetManager: Starting task 189.0 in stage 1214.0 (TID 22248, ip-10-10-175-231.eu-west-1.compute.internal, executor 19, partition 189, PROCESS_LOCAL, 8073 bytes)
20/05/13 23:36:24 INFO TaskSetManager: Finished task 39.0 in stage 1214.0 (TID 22098) in 4276 ms on ip-10-10-175-231.eu-west-1.compute.internal (executor 19) (2/200)

Note: I noticed that these exceptions do not happen when we first load the delta table, because in the init load we obviously don't use the .merge functionality of delta lake io. So, that leads me to believe that it is related to something while logging things during the merge operation. But again, this does not seem to affect any of the results, as the results are as expected.

It would be nice if anyone has an idea for such behaviour, to check if this is an issue, or not, in delta lake io 0.6.0. Thanks!

Carlos Costa
  • 195
  • 1
  • 11

1 Answers1

3

This error doesn't impact anything of your jobs, except it may impact the debugging when you look at the Spark UI on Spark History Server: you may see an active stage which should have been finished.

This issue will be fixed in Apache Spark 2.4.7/3.0.1/3.1.0. Please check the following links for more details regarding this issue:

zsxwing
  • 20,270
  • 4
  • 37
  • 59
  • can you please explain if this mean there is some issue in the execution of the task? `20/05/13 23:34:29 INFO DAGScheduler: Submitting 521 missing tasks from ShuffleMapStage 1212 (MapPartitionsRDD[96] at apply at DatabricksLogging.scala:77) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) ` Submitting 521 missing task is quite a concern there. – manishbelsare Apr 30 '21 at 17:44