I have a non-standard kafka format messages so the code looks like as following
val df:Dataset[String] = spark
.readStream
.format("kafka")
.option("subscribe", topic)
.options(kafkaParams)
.load()
.select($"value".as[Array[Byte]])
.map { v =>
val e = MyAvroSchema.decodeEnvelope(v)
val d = MyAvroSchema.decodeDatum(e)
d
}
At this point d is a string that represents csv line, For example
2018-01-02,user8,campaing1,type6,...
Assuming that I can create a csvSchema:StructType
How can I convert it to the Dataframe[Row] with csvSchema? One complication is that schema size is big (about 85 columns), so creating case class, or tuple is not really an option