1

I have a Spark job that have been working well till a few days ago I needed to enable Kryo Serialization.

spark.kryo.registrationRequired true
spark.kryo.referenceTracking true
spark.kryo.registrator org.mycompany.serialization.MyKryoRegistrator
spark.serializer org.apache.spark.serializer.KryoSerializer

Now it started to complain it can not find registered classes. I registered like this

def registerByName(kryo: Kryo, name: String) = kryo.register(Class.forName(name))

registerByName(kryo, "org.apache.spark.util.collection.BitSet")
registerByName(kryo, "org.apache.spark.util.collection.OpenHashSet")
registerByName(kryo, "org.apache.spark.util.collection.OpenHashSet$Hasher")
registerByName(kryo, "org.apache.spark.util.collection.OpenHashMap")
registerByName(kryo, "org.apache.spark.util.collection.OpenHashMap$mcJ$sp")

After this it complains with

com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.util.collection.OpenHashMap$mcJ$sp$$Lambda$1429/0x0000000800cd3840
Note: To register this class use: kryo.register(org.apache.spark.util.collection.OpenHashMap$mcJ$sp$$Lambda$1429/0x0000000800cd3840.class

But if I try register

registerByName(kryo, "org.apache.spark.util.collection.OpenHashMap$mcJ$sp$$Lambda$1429/0x0000000800cd3840")

it throws java.lang.ClassNotFoundException

The class OpenHashMap is private for [spark] package scala generic that is used somewhere in deeps of Spark and seems like once Kryo is enabled, Spark offloads all serialization things related to its internals to Kryo. If it was my class I would write custom serializer but I have no Idea what can I do in my situation.

The problematic class definition https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala

Alex S
  • 103
  • 1
  • 2
  • 8
  • What do you use kryo serialization for? – MaxG Oct 28 '21 at 19:00
  • @MaxG we started using in our job 3rd party framework which offers a lot of ready to use algorithms adapted for Spark. The framework introduce a lot of classes with special KryoSerializers and KryoSerializerRegistrar, so in order to use it I have no choice but enable Kryo. The framework related things work well, but our written things started failing. – Alex S Oct 28 '21 at 19:04
  • do you have to pass this 3rd party framework spark specialized data structures? Can't you transform them to normal scala map/set before you serialize it, for example? – MaxG Oct 28 '21 at 19:11
  • @MaxG they are normal scala types, but the problems is even not related to 3rd party framework itself. It is something in deeps of Spark, I don't use OpenHashMap, 3rd party framework does not use it as well. – Alex S Oct 28 '21 at 19:22
  • @AlexS were you ever able to figure this out? I'm having nearly the identical issue. https://stackoverflow.com/questions/73964881/spark-3-kryoserializer-issue-unable-to-find-class-org-apache-spark-util-colle. So far no leads. Considering removing the usage of the KryoSerializer just for this section of code. – Zachary Steudel Oct 05 '22 at 18:47

0 Answers0