I have a Dataset that I wish to convert to a type-dataset where the type is a case class having Option
for several parameters. For example using spark shell I create a case class, a encoder and (raw) Dataset:
case class Analogue(id: Long, t1: Option[Double] = None, t2: Option[Double] = None)
val df = Seq((1, 34.0), (2,3.4)).toDF("id", "t1")
implicit val analogueChannelEncoder: Encoder[Analogue] = Encoders.product[Analogue]
I want to create a Dataset<Analogue>
from df
so I try:
df.as(analogueChannelEncoder)
But this results in the error:
org.apache.spark.sql.AnalysisException: cannot resolve '`t2`' given input columns: [id, t1];
Looking at the schemas of df
and analogueChannelEncoder
the difference is apparent:
scala> df.schema
res3: org.apache.spark.sql.types.StructType = StructType(StructField(id,IntegerType,false), StructField(t1,DoubleType,false))
scala> analogueChannelEncoder.schema
res4: org.apache.spark.sql.types.StructType = StructType(StructField(id,LongType,false), StructField(t1,DoubleType,true), StructField(t2,DoubleType,true))
I have seen this answer but this will not work for me as my Dataset
is assembled and is not a straight-forward load from a data source
How can I cast my untyped Dataset<Row>
to Dataset<Analogue>
?