Pivot to transpose row to column in Spark structured streaming using scala

Question

We are using Spark structured streaming 2.4.1 version to process events from Kafka to Cassandra. The event is a nested JSON and we need to flatten the data and load in Cassandra table.

I tried to use pivot on dataframe but it is throwing below error message. Could someone please help me to resolve this issue.

Json event structure -

{
  "event_name": "some event",
  "groups": [
    {
      "data_group_name": "personname",
      "fields": [
        {
          "col_name": "firstname",
          "value": "John"
        },
        {
          "col_name": "lastname",
          "value": "williams"
        }
      ]
    },
    {
      "data_group_name": "contact",
      "fields": [
        {
          "col_name": "mobile",
          "value": "1234567890"
        },
        {
          "col_name": "home",
          "value": "0987654321"
        }
      ]
    }
  ]
}

df.pivot($"col_name").agg(first($"value"),null)

Expected Result:

----------------
event_name  firstname   lastname    mobile          home
----------------------------------------------------------------------------
some event  John            williams    1234567890  987654321

Error Message -

Queries with streaming sources must be executed with writeStream.start();; kafka at org.apache.spark.sql.catalyst.analysis.UnsupportedOperationChecker$.org$apache$spark$sql$catalyst$analysis$UnsupportedOperationChecker$$throwError(UnsupportedOperationChecker.scala:389)

Using spark 2.4.4, same problem here :/ – drrzmr Dec 09 '19 at 03:22 — drrzmr, Dec 09 '19 at 03:22

Pivot to transpose row to column in Spark structured streaming using scala

Json event structure -

0 Answers0