I have next case class:
case class Data[T](field1: String, field2: T)
I'm using kryo serializer with next implicits for it:
implicit def single[A](implicit c: ClassTag[A]): Encoder[A] = Encoders.kryo[A](c)
implicit def tuple2[A1, A2](implicit e1: Encoder[A1], e2: Encoder[A2]): Encoder[(A1, A2)] =
Encoders.tuple[A1, A2](e1, e2)
...
And I tried to make next join:
val ds1 = someDataframe1.as[(String, T)].map(row => Data(row._1, row._2))
val ds1 = someDataframe2.as[(String, T)].map(row => Data(row._1, row._2))
ds1.joinWith(ds2, col("field1") === col("field1"), "left_outer")
After that I got next exception:
org.apache.spark.sql.AnalysisException: cannot resolve 'field1' given input columns: [value, value];
What happened with column names in my datasets?
UPD:
when I called ds1.schema
I got next output:
StructField(name = value,dataType = BinaryType, nullable = true)
I think that I have problem with kryo serialization (there is no schema metadata, case class was serialized just as single blob field without name). Also I noticed that all works good when T
is kryo known class (Int, String) or case class. But when T
is some Java bean I get my Data dataset schema as single blob unnamed field.
Spark version 1.6.1