1

Is there a way to convert Row to JSON inside foreachPartition? I have looked at How to convert Row to json in Spark 2 Scala . However this approach won't work as I can't access sqlContext from within foreachPartition and also my data contains nested type.

 dataframe.foreachPartition { partitionOfRecords =>

    ..
    val connectionString: ConnectionStringBuilder = new ConnectionStringBuilder(
                eventHubsNamespace,
                eventHubName,
                policyName,
                policyKey)

    val eventHubsClient: EventHubClient = EventHubClient.createFromConnectionString(connectionString.toString()).get()

    val json = /* CONVERT partitionOfRecords to JSON */

    val bytes = json.getBytes()
    val eventData = new EventData(bytes)
    eventHubsClient.send(eventData)
  }
zero323
  • 322,348
  • 103
  • 959
  • 935
vijay
  • 1,203
  • 1
  • 13
  • 25

1 Answers1

4

I'd strongly recommend doing the conversion to JSON before foreachPartition.

The reason is that there's the built-in support for JSON in functions object that you can use to build "stringified" JSONs using to_json function (without reverting to quite involved coding).

to_json(e: Column): Column Converts a column containing a StructType or ArrayType of StructTypes into a JSON string with the specified schema.

I'd recommend doing the following:

dataframe.
  select(to_json($"your-struct-column-here")).
  as[String].
  foreachPartition { json: String => ... }
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420