I am trying to read data from avro files into an RDD using Kryo. My code compiles fine, but in runtime I'm getting a ClassCastException
. Here is what my code does:
SparkConf conf = new SparkConf()...
conf.set("spark.serializer", KryoSerializer.class.getCanonicalName());
conf.set("spark.kryo.registrator", MyKryoRegistrator.class.getName());
JavaSparkContext sc = new JavaSparkContext(conf);
Where MyKryoRegistrator
registers a Serializer for MyCustomClass
:
public void registerClasses(Kryo kryo) {
kryo.register(MyCustomClass.class, new MyCustomClassSerializer());
}
Then, I read my datafile:
JavaPairRDD<MyCustomClass, NullWritable> records =
sc.newAPIHadoopFile("file:/path/to/datafile.avro",
AvroKeyInputFormat.class, MyCustomClass.class, NullWritable.class,
sc.hadoopConfiguration());
Tuple2<MyCustomClass, NullWritable> first = records.first();
This seems to work fine, but using a debugger I can see that while the RDD has a kClassTag of my.package.containing.MyCustomClass, the variable first
contains a Tuple2<AvroKey, NullWritable>
, not Tuple2<MyCustomClass, NullWritable>
! And indeed, when the following line executes:
System.out.println("Got a result, custom field is: " + first._1.getSomeCustomField());
I get an exception:
java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast to my.package.containing.MyCustomClass
Am I doing something wrong? And even so, shouldn't I get a compilation error rather than a runtime error?