Most of the non-serializable issues online get very basic data as an input for their sc.parallelize()
and in the map section they encounter the non-serializable issue, but mine is a type. I have a specific data type, which is coming from a third party library and is not serializable.
So writing this showed NotSerializableException :
val data: RDD[ThirdPartyLib.models.XData] = sc.parallelize(ThirdPartyLib.getX)
data.foreachPartition(rows => {
rows.foreach(row => {
println("value: " + row.getValue)
})
})
As a solution, I created the the same model class(XData) internally but made it serializable and did this:
val data: RDD[XData] = (sc.parallelize(ThirdPartyLib.getX)).asInstanceOf[RDD[XData]]
data.foreachPartition(rows => {
rows.foreach(row => {
println("value: " + row.getValue)
})
})
I was expecting the issue to be resolved, but I am still getting the same error that the [ERROR] org.apache.spark.util.Utils logError - Exception encountered
java.io.NotSerializableException: ThirdPartyLib.models.XData
. Shouldn't the problem re resolved when I created that internal serializable type? How can I fix this?