1

I have a non-standard kafka format messages so the code looks like as following

 val df:Dataset[String] = spark
  .readStream
  .format("kafka")
  .option("subscribe", topic)
  .options(kafkaParams)
  .load()
  .select($"value".as[Array[Byte]])
  .map { v =>
    val e = MyAvroSchema.decodeEnvelope(v)
    val d = MyAvroSchema.decodeDatum(e)
    d 
  }

At this point d is a string that represents csv line, For example

2018-01-02,user8,campaing1,type6,...

Assuming that I can create a csvSchema:StructType

How can I convert it to the Dataframe[Row] with csvSchema? One complication is that schema size is big (about 85 columns), so creating case class, or tuple is not really an option

Julias
  • 5,752
  • 17
  • 59
  • 84

0 Answers0