0

I have this:

import spark.implicits._
import org.apache.spark.sql.catalyst.encoders.RowEncoder

val mydata: Dataset[Row] = spark.read.format("csv").option("header", true).option("inferSchema", true).load("mydata.csv")
// CSV header: Time,Area,City
// CSV values: "2016-01","A1","NY"
//             "2016-01","AB","HK" etc

// ...somewhere in my aggregate:
def bufferEncoder: Encoder[Array[(String, Row)]] = ....

For the inner tuple in the Array I can write:

val rowEncoder = RowEncoder(mydata.schema)
Encoders.tuple(Encoders.STRING, rowEncoder)

but how can I write the Encoder for the outer Array?

Randomize
  • 8,651
  • 18
  • 78
  • 133

1 Answers1

0

You'll need to either use RowEncoder for complete structure:

val enc = RowEncoder(StructType(Seq(
    StructField("data", ArrayType(
        StructType(Seq(StructField("k", StringType), 
        StructField("v", df.schema))))))))

and convert data to reflect this:

Row(Seq(Row(string, Row(...), Row(sting, Row(...))))

or use static encoder for all fields.

Community
  • 1
  • 1
  • I am not sure if I understood your answer. I am already able to convert the single `Row`. My problem is the `Array`. – Randomize Mar 11 '17 at 14:05