I encountered the following exception: Exception in thread "main" java.io.NotSerializableException: DStream checkpointing has been enabled but the DStreams with their functions are not serializable
I have enabled checkpoting outside, and use this class process something. And it said that this class not serializablbe:
class EventhubsStateTransformComponent(inStream: DStream[EventhubsEvent]) extends PipelineComponent with Logging{
def process() = {
inStream.foreachRDD(rdd => {
if (rdd.isEmpty()) {
logInfo("Extract outstream is empty...")
} else {
logInfo("Extract outstream is not empty...")
}
})
// TODO eventhubsId is hardcode
val eventhubsId = "1"
val statePairStream = inStream.map(eventhubsEvent => ((eventhubsId, eventhubsEvent.partitionId), eventhubsEvent.eventOffset))
val eventhubsEventStateStream = statePairStream.mapWithState(StateSpec.function(EventhubsStreamState.updateStateFunc _))
val snapshotStateStream = eventhubsEventStateStream.stateSnapshots()
val out = snapshotStateStream.map(state => {
(state._1._1, state._1._2, state._2, System.currentTimeMillis() / 1000)
})
outStream = out
}
}
P.S EventhubsEvent is a case class.
=======================================================
New edited: After I make this class extends Serialzable, the exception disappeared. But I wonder what case we need to make our own class extends Serializable. Does it mean that if a class has foreachRDD operation, it will trigger checkpoint to validate code and it need the whole object which contains foreachRDD operation to be Serializable? Because in my memory, some case just require objects in foreachRDD scope need to be serializable.
Serialization stack:
- object not serializable (class: com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent, value: com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent@2a92a7fd)
- field (class: com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent$$anonfun$process$1, name: $outer, type: class com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent)
- object (class com.testdm.spark.streaming.etl.common.pipeline.EventhubsStateTransformComponent$$anonfun$process$1, <function1>)
- field (class: org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, name: cleanedF$1, type: interface scala.Function1)
- object (class org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3, <function2>)
- writeObject data (class: org.apache.spark.streaming.dstream.DStream)
- object (class org.apache.spark.streaming.dstream.ForEachDStream, org.apache.spark.streaming.dstream.ForEachDStream@3e1cb83b)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 16)
- field (class: scala.collection.mutable.ArrayBuffer, name: array, type: class [Ljava.lang.Object;)
- object (class scala.collection.mutable.ArrayBuffer, ArrayBuffer(org.apache.spark.streaming.dstream.ForEachDStream@3e1cb83b, org.apache.spark.streaming.dstream.ForEachDStream@46034134))
- writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData)
- object (class org.apache.spark.streaming.dstream.DStreamCheckpointData, [
0 checkpoint files])
- writeObject data (class: org.apache.spark.streaming.dstream.DStream)
- object (class org.apache.spark.streaming.dstream.PluggableInputDStream, org.apache.spark.streaming.dstream.PluggableInputDStream@5066ad14)
- writeObject data (class: org.apache.spark.streaming.dstream.DStreamCheckpointData)
- object (class org.apache.spark.streaming.dstream.DStreamCheckpointData
//....