I have a Spark job that have been working well till a few days ago I needed to enable Kryo Serialization.
spark.kryo.registrationRequired true
spark.kryo.referenceTracking true
spark.kryo.registrator org.mycompany.serialization.MyKryoRegistrator
spark.serializer org.apache.spark.serializer.KryoSerializer
Now it started to complain it can not find registered classes. I registered like this
def registerByName(kryo: Kryo, name: String) = kryo.register(Class.forName(name))
registerByName(kryo, "org.apache.spark.util.collection.BitSet")
registerByName(kryo, "org.apache.spark.util.collection.OpenHashSet")
registerByName(kryo, "org.apache.spark.util.collection.OpenHashSet$Hasher")
registerByName(kryo, "org.apache.spark.util.collection.OpenHashMap")
registerByName(kryo, "org.apache.spark.util.collection.OpenHashMap$mcJ$sp")
After this it complains with
com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.util.collection.OpenHashMap$mcJ$sp$$Lambda$1429/0x0000000800cd3840
Note: To register this class use: kryo.register(org.apache.spark.util.collection.OpenHashMap$mcJ$sp$$Lambda$1429/0x0000000800cd3840.class
But if I try register
registerByName(kryo, "org.apache.spark.util.collection.OpenHashMap$mcJ$sp$$Lambda$1429/0x0000000800cd3840")
it throws java.lang.ClassNotFoundException
The class OpenHashMap is private for [spark] package scala generic that is used somewhere in deeps of Spark and seems like once Kryo is enabled, Spark offloads all serialization things related to its internals to Kryo. If it was my class I would write custom serializer but I have no Idea what can I do in my situation.
The problematic class definition https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala