How to convert spark streaming Dataset[String] to DataFrame[Row]

Asked Jun 28 '18 at 19:33

Active Jun 28 '18 at 19:33

Viewed 174 times

I have a non-standard kafka format messages so the code looks like as following

 val df:Dataset[String] = spark
  .readStream
  .format("kafka")
  .option("subscribe", topic)
  .options(kafkaParams)
  .load()
  .select($"value".as[Array[Byte]])
  .map { v =>
    val e = MyAvroSchema.decodeEnvelope(v)
    val d = MyAvroSchema.decodeDatum(e)
    d 
  }

At this point d is a string that represents csv line, For example

2018-01-02,user8,campaing1,type6,...

Assuming that I can create a csvSchema:StructType

How can I convert it to the Dataframe[Row] with csvSchema? One complication is that schema size is big (about 85 columns), so creating case class, or tuple is not really an option

asked Jun 28 '18 at 19:33

Julias

5,752
17
59
84

How to convert spark streaming Dataset[String] to DataFrame[Row]

0 Answers0